1 Why information matters in politics

In politics like elsewhere, what we know matters for what we want. Lucy wants harsh sentencing because she believes it will reduce crime rates. Were she to find out that it does not, she wouldn’t want harsh sentencing anymore. Bob wants less immigration because he believes that it hurts the economy. Were he to learn that immigration tends to have a positive, economic impact he would no longer want to see it reduced.

That’s why political scientists, rightly, are concerned with studying what voters know, and what difference it would make had they known more. The former question has been extensively investigated in the literature on public ignorance – as it turns out, most of us know very little when it comes to politically relevant matters (Achen and Bartels 2016; Delli Carpini and Keeter 1996). The latter (what difference knowledge makes in politics) has been studied in the literature under the heading of ‘information effects’ (Althaus 2003; Bartels 1996).

The information effects literature makes clear that knowledge does matter for politics, and can in some cases even change the electoral outcome. For example, Ahlstrom-Vij (2020) models an informed EU referendum in the UK, and sees the proportion of remain swing from a minority to a majority. Blais et al. (2009) simulate the outcome of six past Canadian elections, involving three to four parties, with fully informed voters, and see a likely difference in outcome in one. Oscarsson (2007) simulates six past Swedish elections, involving eight main parties, and sees a likely difference in outcome in two of them.

Even where information effects don’t change outcomes, they can still have substantial implications for party political choice. Bhatti (2010) models three European Parliament elections and finds several cases in which the differences between actual and simulated support are in the double digits. Similarly, Hansen (2009) models two Danish elections and finds a substantial change in the power distribution internal to the party blocks. For example, in one case, doubling the degree to which the voters were informed would have almost doubled the level of support for the Conservatives.

Information effects modeling can also be used to look at the influence of knowledge on political opinion over time. For example, Ahlstrom-Vij (2021) uses ANES data to evaluate the idea that we have entered a “post-truth era”, by we arguing that, if we have, we should expect to see decreasing information effects on central political issues over time. This turns out to be the case: Ahlstrom-Vij shows that, at least in a US context, we see a decrease in information effects on party preferences as well as on key, political issues – immigration, same-sex adoption and gun laws, in particular – in the period 2004 to 2016, which offers some novel, empirical evidence for the “post-truth” narrative.

2 The data set

Whether explicitly framed in those terms, modeling of information effects involves a form of counterfactual or causal modeling (Morgan and Winship 2015): a model is fitted, not for purposes of making a straightforward prediction (as in predictive modeling), e.g., concerning how some particular respondent might respond, but in order to estimate how a respondent would have responded, had they been more informed, with reference to some relevant measure of political knowledge (Delli Carpini and Keeter 1996). Such an estimation is performed by fitting the model on the relevant data, and then using the model to make a “prediction,” once the value on the political knowledge variable for each respondent has been set to whatever value designates being “informed,” thereby estimating what each respondent would have responded, had they been fully informed.

How does this work in practice? That’s the question this guide is looking to answer. It will walk through each step in a complete pipeline from the constructing a political knowledge scale from a set of items to modeling the relevant effects, using functions written in R (R Core Team 2018) – a free, open source language for statistical computing – that can be re-used by others interested in information effects modeling on their own particular data sets.

The functions and their outputs will be illustrated by way of subset of Wave 17 of the British Election Study Internet Panel (Fieldhouse 2019) (N = 34,366). As our outcome, we will use the following attitudinal variable (immigSelf): “On a scale from 0-10, how would you prefer immigration levels to Britain changed?” (0 = reduced a lot, 5 = kept the same, 10 = increased a lot). For purposes of modeling, this variable has been re-coded as a binary one, with 1 for responses below 5, and 0 otherwise:

df %>% 
  count(immig_self)

## # A tibble: 2 × 2
##   immig_self     n
##        <dbl> <int>
## 1          0 16596
## 2          1 17770

In what follows, we will use these variables to estimate what difference information would make to anti-immigration sentiments. To that end, we’ll also use a set of demographic and socioeconomic covariates, as follows:

df %>% 
  select(education, income, gender, age, religion, ethnicity, party_id, eu_ref_vote)

## # A tibble: 34,366 × 8
##    education income gender age    religion    ethnicity     party_id eu_ref_vote
##    <chr>     <chr>  <chr>  <chr>  <chr>       <chr>         <chr>    <chr>      
##  1 no_qual   Q1     female over65 c_of_e      white_british libdem   remain     
##  2 a-level   Q1     male   over65 no_religion white_british no_party remain     
##  3 a-level   Q3     male   56-65  no_religion white_british plaid    leave      
##  4 gcse      Q1     female over65 no_religion white_british cons     remain     
##  5 a-level   Q2     male   over65 c_of_e      white_british no_party remain     
##  6 postgrad  Q3     female 26-35  no_religion white_british labour   leave      
##  7 gcse      Q4     female 56-65  c_of_e      white_british no_party remain     
##  8 gcse      Q3     female 56-65  c_of_e      white_british cons     remain     
##  9 no_qual   Q2     male   56-65  c_of_e      white_british plaid    leave      
## 10 gcse      Q5     male   56-65  no_religion white_british cons     leave      
## # … with 34,356 more rows

We will also make use of four knowledge items, coded as 1 for correct, and 0 for incorrect or “Don’t know” responses (Zaller 1992: 339; Althaus 2003: 105):

k1: “Polling stations close at 10.00pm on election day” (True)
k2: “No-one may stand for parliament unless they pay a deposit” (True)
k3: “MPs from different parties are on parliamentary committees” (True)
k4: “The number of MPs in Parliament is about 100” (False)

If we sum up the number of correct answers, we get the following distribution:

df %>% 
  mutate(total_score = k1 + k2 + k3 + k4) %>% 
  count(total_score)

## # A tibble: 5 × 2
##   total_score     n
##         <dbl> <int>
## 1           0  1994
## 2           1  4611
## 3           2  5225
## 4           3  6869
## 5           4 15667

Finally, we will also use the survey weight variable (renamed survey_wt in the subset we will be using) included with the data set, in order to have our results be representative of the UK population.

3 How to model information effects in R

3.1 Constructing a knowledge scale using IRT

The first thing we need in order to model information effects is, naturally enough, some measure of participant’s level of political knowledge. Following the work of Micahel Delli Carpini and Scott Keeter (Delli Carpini and Keeter 1993, 1996), this typically takes the form of a number of TRUE / FALSE items, where “Don’t know” responses, as already noted, are generally coded as FALSE, i.e., as respondents not knowing the relevant answer (Zaller 1992: 339; Althaus 2003: 105).

One straightforward way to create such a scale is to simply add up all correct answers, for a total knowledge score (Althaus 2003). One downside with doing so is that, outside of getting no questions right and getting all questions wrong, there are more than one ways to get a particular number of responses correct. Since some questions are more difficult than others, and getting those right thereby is more diagnostic of being informed, a purely additive scale thereby risks grouping together people of different abilities.

A better way to construct the relevant scale is therefore to use Item Response Theory (IRT) model. IRT modeling is an established method for modeling underlying, latent traits, such as abilities. Such models are able to discriminate between the ability of respondents with the same number of correct responses but different response patterns. As we shall see, an IRT model also offers a clear window into the performance both of individual items and the scale as a whole, thereby helping the researcher construct a good knowledge scale.

The latent traits modeled by way of IRT are assumed to fall on a continuous scale. Values on that scale are usually referred to by way of the Greek letter \(\theta\) (theta), and taken to range from -\(\infty\) to +\(\infty\), with a mean of 0 and standard deviation of 1. This means that, while the individual \(\theta\) values ascribed to any particular respondent has no intrinsic meaning, it can nevertheless be interpreted relative to an estimated population mean.

The function below uses R’s mirt package (Chalmers 2012) to generate an IRT scale on the basis of a set of knowledge items, a data frame wherein to put that scale, and a percentile cut-off for a corresponding binary knowledge scale (more on the reason for this in a moment):

library(mirt)
library(psych)
library(ggpubr)
inf_irt_scale <- function(items, data, binary_cutoff = 0.9) {
  # save all knowledge items to a data frame
  items_df <- data.frame(matrix(NA, nrow = dim(data)[1], ncol = length(items)))
  for (i in 1:length(items)) {
    items_df[,i] <- data[[items[i]]]
  }
  
  # fit irt model
  irt_mod <- mirt(data=items_df,
                  model=1,
                  itemtype = "2PL",
                  verbose=FALSE)
  
  # save knowledge scores
  know_scores <- fscores(irt_mod)[,1]
  
  # create binary knowledge variable
  knowledge_threshold <- quantile(know_scores, binary_cutoff)
  know_scores_binary <- ifelse(know_scores >= knowledge_threshold, 1, 0)
  know_scores_binary_tbl <- prop.table(table("Proportion of observations in each category:" = know_scores_binary))
  
  # save empirical plots to list
  plot_list_empirical <- vector('list', length(items))
  for (i in 1:length(items)) {
    plot_list_empirical[[i]] <- local({
      i <- i
      print(itemfit(irt_mod, empirical.plot = i))
    })
  }
  empirical_plots <- ggarrange(plotlist = plot_list_empirical)
  
  # scree plot
  psych::fa.parallel(items_df, fa="fa")
  
  return(list("know_scores" = know_scores,
              "know_scores_binary" = know_scores_binary,
              "know_scores_binary_tbl" = know_scores_binary_tbl,
              "empirical_plots" = empirical_plots,
              "trace_plot" = plot(irt_mod, type="trace"),
              "info_plot" = plot(irt_mod, type="info"),
              "fa_parallel" = recordPlot(),
              "coef" = coef(irt_mod, IRTpars=T),
              "model_summary" = summary(irt_mod),
              "q3" = data.frame(residuals(irt_mod, type="Q3"))))
}

In addition to generating a knowledge scale, the function also returns a number of elements to use in evaluating the model, as follows:

An IRT scale (of this kind) needs to be unidimensional, i.e., the items involved should tap into a single trait. This can be evaluated by way consulting the fa_parallel element returned by the function.
An IRT scale should also exhibit local independence, meaning that, conditional on the latent variable, item responses should be unrelated to one another. This is evaluated using Yen’s Q3 (Yen 1993), also returned by the function.
Model fit can be evaluated visually by inspecting the empirical_plots returned by the function.

Let’s start by fitting a model on our four political knowledge items from the BES data set:

irt_model <- inf_irt_scale(c("k1",
                             "k2",
                             "k3",
                             "k4"),
                           df)

Then let’s look at the parallel analysis plot (for unidimensionality), the Q3 values (for local independence), and the empirical plots (for model fit), in turn:

irt_model$fa_parallel

Parallel analysis is related to the traditional scree method, whereby we plot eigenvalues of a principal axis in descending order. These eigenvalues indicate the amount of variance accounted for by each of factors, out of the total variance. In traditional scree plotting, we simply look at where we get a steep drop in the graph, suggesting that bringing in further factors fails to explain much (further) variance. However, in parallel analysis, we compare the scree plot to eigenvalues from principal axis factoring of random correlation matrices of the same size as the data, and look at how many factors have eigenvalues greater than the corresponding average eigenvalues of the random matrices (Andrews 2021). As can be seen in this graph, one factor has such an eigenvalue, suggesting that the unidimensionality assumption is satisfied.

irt_model$q3

##             X1          X2         X3         X4
## X1  1.00000000 -0.09945417 -0.1165377 -0.1642594
## X2 -0.09945417  1.00000000 -0.4130960 -0.4549003
## X3 -0.11653775 -0.41309604  1.0000000 -0.3345753
## X4 -0.16425944 -0.45490029 -0.3345753  1.0000000

The largest Q3 value is -0.45. Yen (1993) suggests a cut-off value of 0.2, but as pointed out by Ayala (2009), a Q3 test tends to give inflated negative values for short tests. Indeed, Yen’s own suggestion was in the context of scales with at least 17 items. For that reason, a value of -0.45 would seem acceptable, given the short scale.

irt_model$empirical_plots

The empirical plots for all items suggest an acceptable fit, with some possible reservations about item 1.

Let’s now look more closely at the IRT model itself:

irt_model$model_summary

## $rotF
##           F1
## X1 0.6249458
## X2 0.8564270
## X3 0.8021189
## X4 0.8375793
## 
## $h2
##           h2
## X1 0.3905572
## X2 0.7334672
## X3 0.6433947
## X4 0.7015391
## 
## $fcor
##    F1
## F1  1

The F1 values in the model_summary gives us the loadings of the items onto the factor, showing the variance explained by that variable on the particular factor. All four items load well onto the factor.

irt_model$coef

## $X1
##         a      b g u
## par 1.362 -1.909 0 1
## 
## $X2
##         a      b g u
## par 2.823 -0.442 0 1
## 
## $X3
##         a      b g u
## par 2.286 -0.492 0 1
## 
## $X4
##         a      b g u
## par 2.609 -0.566 0 1
## 
## $GroupPars
##     MEAN_1 COV_11
## par      0      1

In regards to the coefficients (coef), we ideally want to see discrimination values (a) greater than 1, which would indicate that the relevant item discriminates well between people of different levels of knowledge. This discrimination value is also reflected in the item probability function (trace_plot) below, with steeper curves representing greater discrimination.

The b value designates the difficulty of the item, and represents the point on the ability (i.e., \(\theta\)) spectrum on which a respondent becomes more than 50% likely to answer that question correctly. The same value can be plotted on the trace plot by drawing a straight line from 0.5 on the y-axis out to the line, and then tracing a vertical line down to \(\theta\) value on the x-axis, representing the relevant level of ability.

irt_model$trace_plot

The test information plot (info_plot) shows at what point on the ability spectrum the test offers most information, which we in this case can see is just below a \(\theta\) of 0, representing mean ability:

irt_model$info_plot

As mentioned above, the irt_model function above also generates a binary knowledge variable, by default constructed by assigning a 1 (“informed”) to everyone in the 90th percentile and above on the knowledge scale, and 0 (“not informed”) otherwise. This the binary variable that will be used later on when calculating so-called propensity scores, for purposes of balancing the data set and break any correlation between the knowledge variable and demographic variables.

For such balancing to work, we ideally want to set the bar for being “informed” at a level that’s demanding enough to be conceptually plausible, yet not so demanding that very few people quality. This can be evaluated by consulting the proportion of observations that end up in each of the two categories:

irt_model$know_scores_binary_tbl

## Proportion of observations in each category:
##         0         1 
## 0.5441134 0.4558866

As we can see, about 45% of the sample end up in the “informed” category, which suggests that the items in the scale are fairly easy. This should be kept in mind when eventually interpreting any information effect.

Finally, let us save our knowledge scores to our data frame:

df$knowledge <- irt_model$know_scores
df$knowledge_binary <- irt_model$know_scores_binary

3.2 Evaluating construct validity

By performing the type of diagnostics covered in the previous section on our knowledge scale, we can get a good sense of whether the model performs well from a formal perspective, i.e., in regards to unidimensionality, local independence, and model fit. However, we also would like to be able to validate that the score plausibly is measuring a form of political knowledge specifically.

One way of doing this is to investigate the relationship between our knowledge scale and demographic factors that we know to be associated with political knowledge. Specifically, we expect that men should score more highly on our scale than women (vanHeerde-Hudson 2020; Plutzer 2020), and that the same should go for people who are older (Plutzer 2020), who have higher levels of education (Rasmussen 2016), and who earn more (Vowles 2020; Plutzer 2020).

One way to investigate this is to look at the estimated marginal means for each level of education, income, gender, and age as follows:

# order factor levels
df <- df %>% 
  mutate(income = factor(income, 
                         levels = c("Q1",
                                    "Q2",
                                    "Q3",
                                    "Q4",
                                    "Q5"),
                         ordered = T),
         education = factor(education,
                            levels = c("no_qual",
                                       "below_gcse",
                                       "gcse",
                                       "a-level",
                                       "undergrad",
                                       "postgrad"),
                            ordered = T))

library(emmeans)
inf_emmeans <- function(knowledge_var, covariates, data) {
  # construct formula
  f <- as.formula(
    paste(knowledge_var,
          paste(covariates, collapse = " + "),
          sep = " ~ "))
  
  # fit model
  m <- lm(f, 
          data = data)
  
  # create list of emmeans by each covariate
  emmeans_list <- list()
  for (i in 1:length(covariates)) {
    emmeans_list[[i]] <- emmeans(m, specs = covariates[i])
  }
  
  return(emmeans_list)
}

inf_emmeans("knowledge",
            c("income", "education","gender","age"), 
            df)

## [[1]]
##  income  emmean      SE    df lower.CL upper.CL
##  Q1     -0.2221 0.00780 34350  -0.2374 -0.20685
##  Q2     -0.1662 0.00886 34350  -0.1836 -0.14887
##  Q3     -0.1362 0.00995 34350  -0.1558 -0.11674
##  Q4     -0.1068 0.00911 34350  -0.1247 -0.08895
##  Q5     -0.0118 0.01018 34350  -0.0317  0.00816
## 
## Results are averaged over the levels of: education, gender, age 
## Confidence level used: 0.95 
## 
## [[2]]
##  education   emmean      SE    df lower.CL upper.CL
##  no_qual    -0.4023 0.01418 34350   -0.430  -0.3745
##  below_gcse -0.3117 0.01731 34350   -0.346  -0.2778
##  gcse       -0.2207 0.00831 34350   -0.237  -0.2044
##  a-level    -0.0905 0.00810 34350   -0.106  -0.0746
##  undergrad   0.0497 0.00646 34350    0.037   0.0623
##  postgrad    0.2036 0.01155 34350    0.181   0.2263
## 
## Results are averaged over the levels of: income, gender, age 
## Confidence level used: 0.95 
## 
## [[3]]
##  gender  emmean      SE    df lower.CL upper.CL
##  female -0.3515 0.00609 34350  -0.3634   -0.340
##  male    0.0942 0.00663 34350   0.0812    0.107
## 
## Results are averaged over the levels of: income, education, age 
## Confidence level used: 0.95 
## 
## [[4]]
##  age    emmean      SE    df lower.CL upper.CL
##  18-25  -0.417 0.01511 34350  -0.4462  -0.3870
##  26-35  -0.562 0.01093 34350  -0.5833  -0.5404
##  36-45  -0.383 0.00970 34350  -0.4020  -0.3640
##  46-55  -0.049 0.00908 34350  -0.0668  -0.0312
##  56-65   0.242 0.00866 34350   0.2249   0.2589
##  over65  0.397 0.00781 34350   0.3814   0.4120
## 
## Results are averaged over the levels of: income, education, gender 
## Confidence level used: 0.95

We can see that the scale value increases in an (almost exclusively) monotonic fashion for education, income, and age, which is what we should expect if our scale measures political knowledge. We also see that the mean level of knowledge is greater for men than for women. This all offers some evidence of construct validity.

3.3 Calculating propensity scores

As noted earlier, information modeling is a type of counterfactual modeling, estimating the causal effect that we would have seen, had we been able to intervene on (i.e., increase) the knowledge variable. Best practice in counterfactual modeling is to rely on so-called doubly-robust estimation, which looks to approximate the situation we would have found ourselves in, had our data been the result of a randomized experimental design with a single treatment (Morgan and Winship 2015). The ‘double robustness’ owes to how effects are estimated in a context where we have both controlled for assumed confounds (as in standard regression), and taken steps to make up for the fact that the data have not come about as a result of randomized assignment. In the present case, this second layer of ‘robustness’ is achieved by using so-called ‘propensity scores’ as weights in the subsequent models.

This is where the binary knowledge variable from before comes in. In our case, propensity scores measure the probability (i.e. propensity) that an observation will be found in the ‘fully informed’, binary category, as a function of someone’s demographic features. The idea is to then use these scores to remove any correlation between these features and the ‘informed’ category, to justify a counterfactual inference.

To see why, return to the paradigm of a randomized experimental design, where the random allocation of participants to a treatment and a control group means that no feature of the participant is predictive of being found in the treatment as opposed to in the control. Whether female or male, rich or poor (and so on), you are equally likely to end up in one group as opposed to in the other. In the case of observational data, by contrast, this might not be the case. It might (for example) be that some features of the observations – such as, their level of education – are predictive of ending up in the ‘informed’ category.

In fact, let’s look at the data at hand, to determine whether the demographic factors that we have reason to believe influence someone’s degree of political knowledge – again, gender, level of education, income, and age (Plutzer 2020) – are predictive of knowledge, as measured by our scale:

df <- df %>% 
  mutate(education = factor(education,
                            ordered = F),
         income = factor(income,
                         ordered = F))

m <- glm(knowledge_binary ~ 
           age +
           education +
           gender +
           income,
         data = df,
         family = "binomial")
summary(m)

## 
## Call:
## glm(formula = knowledge_binary ~ age + education + gender + income, 
##     family = "binomial", data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3806  -0.9137  -0.4555   0.9573   2.5502  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -2.987678   0.074296 -40.213  < 2e-16 ***
## age26-35            -0.473751   0.064920  -7.297 2.93e-13 ***
## age36-45             0.004427   0.060290   0.073  0.94147    
## age46-55             0.915762   0.058653  15.613  < 2e-16 ***
## age56-65             1.638358   0.058433  28.038  < 2e-16 ***
## ageover65            2.032476   0.058251  34.892  < 2e-16 ***
## educationbelow_gcse  0.249039   0.073126   3.406  0.00066 ***
## educationgcse        0.508128   0.052563   9.667  < 2e-16 ***
## educationa-level     0.904158   0.054356  16.634  < 2e-16 ***
## educationundergrad   1.314389   0.052504  25.034  < 2e-16 ***
## educationpostgrad    1.812900   0.063408  28.591  < 2e-16 ***
## gendermale           1.323824   0.025377  52.166  < 2e-16 ***
## incomeQ2             0.183424   0.035888   5.111 3.20e-07 ***
## incomeQ3             0.261691   0.039723   6.588 4.46e-11 ***
## incomeQ4             0.366727   0.038345   9.564  < 2e-16 ***
## incomeQ5             0.591538   0.041868  14.129  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 47374  on 34365  degrees of freedom
## Residual deviance: 38619  on 34350  degrees of freedom
## AIC: 38651
## 
## Number of Fisher Scoring iterations: 4

Looking at the coefficient values for age, education, and income to begin with, we see that the difference in effect between the lowest age bracket, level of income, and income bracket increases as we move up the factor levels. These differences are in virtually all cases also significant. The same goes for the difference in knowledge between men and women, with men knowing more than women.

Let’s now calculate the propensity scores, and visualise them as a histogram to get a sense of their distribution:

inf_prop_scores <- function(knowledge_var, covariates, data) {
  # construct formula
  f <- as.formula(
    paste(knowledge_var,
          paste(covariates, collapse = " + "),
          sep = " ~ "))
  
  # calculate propensity scores
  p_scores <- glm(f,
                  data = data, 
                  family = "binomial")
  data$ps_value <- predict(p_scores, type="response")
  
  # return propensity scores
  return(ifelse(data[[knowledge_var]] == 1, 1/data$ps_value, 1/(1-data$ps_value)))
}

df$prop_score <- inf_prop_scores(knowledge_var = "knowledge_binary", 
                                 covariates = c("age","gender","education","income"), 
                                 data = df)

df %>% 
  ggplot() +
  aes(x = prop_score) +
  geom_histogram(binwidth=0.1, color = "black", fill = "salmon")

What we want to see in this distribution is a clustering of propensity scores towards the low end, and not too many extreme scores. That said, extreme scores should not automatically be assumed to be incorrect (Levy et al. 2008), although one should be mindful that they of course have a disproportionate influence when subsequently using them as weights in our regression model (more on this below). When properly estimated, however, such weighting will counteract any correlations between demographics and levels of political knowledge. Specifically, since propensity scores measure the probability of ending up in the ‘treatment’ category, given a set of covariates – in our case, the probability that you would be ‘informed’, given your age, income, level of education and gender – we can use the inverse of those scores as weights (such that an observation with a low propensity is weighted heavily) in fitting the model. Given an appropriately chosen set of covariates when calculating the scores, this recreates a situation that would have been expected in a randomized experiment, thereby allowing greater confidence in any counterfactual inference.

3.4 Evaluating propensity scores using balance plots

Since the whole point of propensity scores is to balance the sample, we want to inspect whether we have been successful on that score using balance plots, here generated using the R package cobalt (Greifer 2022).

library(cobalt)
library(gridExtra)
inf_balance_plots <- function(knowledge_var, covariates, prop_score, data) {
  covs_general <- subset(data, select = covariates)
  plot_list <- vector('list', length(covariates))
  for (i in seq_along(covariates)) {
    plot_list[[i]] <- local({
      i <- i
      print(bal.plot(covs_general, treat = data[[knowledge_var]], estimand = "ATE", weights = data[[prop_score]], method = "weighting", var.name = covariates[i], which = "both"))
    })
  }
  return(plot_list)
}

bal_plots <- inf_balance_plots("knowledge_binary", 
                               c("age", "gender", "income", "education"), 
                               "prop_score", 
                               df)

We see there that, in each case, balance has been improved by the propensity scores (the right pane in each of the graphs). Had that not been the case, we might have wanted to revisit the cut-off for being ‘informed’ (currently, 90th percentile or above), in case the balance between the two groups is so lopsided that it’s difficult to balance them using propensity scores.

3.5 Drawing up a DAG

We now need to specify our model – or models, in case we want to use several specifications. Since care needs to be taken when specifying and interpreting causal models, the first thing to do is to justify one’s choices regarding what variables are to be included and excluded. By transparently presenting the rationale for model specifications, and visualizing this as a directed acyclic graph (DAG), we adhere to good practice for political scientists who use observational data to address causal questions (Keele, Stevenson, and Elwert 2020).

As in traditional regression, we need to control for any confounders. In our case, those are variables that have a causal effect on both someone’s degree of political knowledge and their political attitudes or preferences. Existing evidence suggests gender (vanHeerde-Hudson 2020; Plutzer 2020), level of education (Rasmussen 2016), income (Vowles 2020; Plutzer 2020), and possibly age (Plutzer 2020) fall in this category. Moreover, to reduce noise in these models, we also do well to control for variables that can be expected to change someone’s political preferences, whether also their degree of knowledge, such as ethnicity (e.g., through a “shared faith”; Dawson (1995)) and religion (Geoffrey Evans 2020).

What about partisanship, as measured by party identification? Despite being a prominent variable in political scientific modeling, we probably do well to exclude it in this context for two reasons:

First, it is likely affected by political knowledge, specifically, knowledge of parties’ and candidates’ positions (Brader and Tucker 2018; Fowler and Margolis 2014). This would make it a mediator in the language of counterfactual/causal modeling. Controlling for a mediator, or a causal node located on a direct or indirect pathway between (in this case) political knowledge and political preference, will mean misestimating the relevant causal effect (Rohrer 2018).

Second, even if partisanship is not a mediator, controlling for it in this context is likely unnecessary. Partisanship is heavily influenced by socialization early in life (Campbell et al. 1980), including around group-identities of religion, ethnicity, gender, and the like – all of which shape individuals’ conceptions of what positions “people like us” take in politics (Green, Palmquist, and Schickler 2004) Consequently, controlling for such group-level variables (here: gender, ethnicity, and religion) would already account for partisanship.

In the UK context, EU referendum vote is likely to in this respect be similar to partisanship, as far as causal modeling is concerned: research suggests that UK voters’ identification with “Leave” or “Remain” camps have become political identities in their own rights (Hobolt, Leeper, and Tilley 2021). If correct, then referendum vote choice, too, is an unnecessary control because it is a function of socialization variables for which our model already controls.

The DAG below uses the R package ggdag (Barrett 2022) to summarise the assumptions made for the purpose of modeling. Note the status of partisanship (“Party” in the graph) as a mediator for knowledge (“Know.”). If partisanship is an unnecessary control, the edge between knowledge and partisanship should be removed. Depending on the position one takes in relation to whether the EU referendum has become a political identity in its own right, EU referendum vote would either replace the partisanship node or (perhaps more plausibly) inhabit a structurally identical node (i.e., a mediator for knowledge) alongside it.

library(ggdag)
theme_set(theme_dag())

coords <- tibble::tribble(
  ~name,            ~x,  ~y,
  "Y",              4,   0,
  "Know.",          0,   0,
  "Edu.",           0,   1,
  "Gend.",          1,   1,
  "Inc.",           2,   1,
  "Age",            3,   1,
  "Ethn.",          1,   -1.5,
  "Rel.",           2,   -1.5,
  "Party",          2,   -0.5
)

pt_dag <- dagify(Y ~ Know. + Gend. + Edu. + Inc. + Ethn. + Rel. + Age + Party,
                  Party ~ Know. + Gend. + Ethn. + Rel.,
                  Know. ~ Gend. + Edu. + Inc. + Age,
                         exposure = "Know.",
                         outcome = "Y",
                    coords = coords)

ggdag(pt_dag, stylized = TRUE, node_size = 20)

3.6 Calculating information effects

We now have everything we need to actually calculate information effects, i.e., differences in proportions between actual and estimated informed levels of support for some particular statement, policy, or the like. In light of the discussion above about different model specifications, and questions about the exact causal role of partisanship and any identities tied up with the UK’s 2016 EU referendum vote, we fit and display the results for three models, in the interest of robustness: one purely demographic, one that also controls for partisanship, and one that additionally controls for the respondent’s EU referendum vote.

library(boot)
inf_effect <- function(outcome, knowledge_var, covariates, prop_weight, survey_weight, boot_ci = F, data) {
  # construct formula
  f <- as.formula(
    paste(outcome,
          paste(knowledge_var,
          paste(covariates, collapse = " + "), sep = " + "),
          sep = " ~ "))
  
  # fit model
  m <- glm(f,
           data = data,
           family = "binomial",
           weights = df[[prop_weight]])
  
  # make everyone in the data set informed
  data[[knowledge_var]] <- 1
  
  # calculate actual and informed support
  actual <- weighted.mean(data[[outcome]], data[[survey_weight]])
  informed_outcome <- predict(m, newdata = data, type = "response")
  informed <- weighted.mean(informed_outcome, data[[survey_weight]])
  
  # generate bootstrap confidence intervals
  if (boot_ci == T) {
    meanfun <- function(data, indices) {
      d <- data[indices]
      return(mean(d))
    }
    mean_wt <- mean(df[[survey_weight]])
    boot_actual <- boot(data[[outcome]] * data[[survey_weight]], meanfun, R=1000)
    boot_informed <- boot(informed_outcome * data[[survey_weight]], meanfun, R=1000)
    actual_lwr <- boot.ci(boot_actual, conf = 0.95, type = "basic")$basic[4]/mean_wt
    actual_upr <- boot.ci(boot_actual, conf = 0.95, type = "basic")$basic[5]/mean_wt
    informed_lwr <- boot.ci(boot_informed, conf = 0.95, type = "basic")$basic[4]/mean_wt
    informed_upr <- boot.ci(boot_informed, conf = 0.95, type = "basic")$basic[5]/mean_wt
    return(list("formula" = f,
                "model" = m,
                "actual_proportion" = actual,
                "actual_upr" = actual_upr,
                "actual_lwr" = actual_lwr,
                "informed_proportion" = informed,
                "informed_upr" = informed_upr,
                "informed_lwr" = informed_lwr,
                "difference" = informed - actual))
  }
  else {
    return(list("formula" = f,
                "model" = m,
                "actual_proportion" = actual, 
                "informed_proportion" = informed,
                "difference" = informed - actual))
  }
}

inf1 <- inf_effect(outcome = "immig_self", 
                  knowledge_var = "knowledge_binary", 
                  covariates = c("age",
                                 "gender",
                                 "education",
                                 "income",
                                 "religion",
                                 "ethnicity"), 
                  prop_weight = "prop_score",
                  survey_weight = "survey_wt",
                  data = df,
                  boot_ci = T)

inf2 <- inf_effect(outcome = "immig_self", 
                  knowledge_var = "knowledge_binary", 
                  covariates = c("age",
                                 "gender",
                                 "education",
                                 "income",
                                 "religion",
                                 "ethnicity",
                                 "party_id"), 
                  prop_weight = "prop_score",
                  survey_weight = "survey_wt",
                  data = df,
                  boot_ci = T)

inf3 <- inf_effect(outcome = "immig_self", 
                  knowledge_var = "knowledge_binary", 
                  covariates = c("age",
                                 "gender",
                                 "education",
                                 "income",
                                 "religion",
                                 "ethnicity",
                                 "party_id",
                                 "eu_ref_vote"), 
                  prop_weight = "prop_score",
                  survey_weight = "survey_wt",
                  data = df,
                  boot_ci = T)

Each model estimates actual and informed support for the idea that levels of immigration coming into the UK should be reduced, in order to estimate what difference level of political knowledge makes on this issue. For both actual and informed support, we apply the survey weights included in the data set to approximate representativeness. We also plot the results with the dashed line representing 50% support.

inf1

## $formula
## immig_self ~ knowledge_binary + age + gender + education + income + 
##     religion + ethnicity
## <environment: 0x7f818f85bcc8>
## 
## $model
## 
## Call:  glm(formula = f, family = "binomial", data = data, weights = df[[prop_weight]])
## 
## Coefficients:
##                     (Intercept)                 knowledge_binary  
##                       -0.800556                        -0.551629  
##                        age26-35                         age36-45  
##                        0.401088                         0.558664  
##                        age46-55                         age56-65  
##                        0.895522                         0.949703  
##                       ageover65                       gendermale  
##                        0.933914                         0.228566  
##             educationbelow_gcse                    educationgcse  
##                       -0.210680                        -0.283231  
##                educationa-level               educationundergrad  
##                       -0.707625                        -1.151355  
##               educationpostgrad                         incomeQ2  
##                       -1.637351                        -0.085797  
##                        incomeQ3                         incomeQ4  
##                       -0.117267                        -0.156219  
##                        incomeQ5                 religionbuddhism  
##                       -0.284748                        -0.325854  
##                  religionc_of_e                 religioncatholic  
##                        0.396551                        -0.006818  
##             religionevangelical                 religionhinduism  
##                       -0.155368                        -0.214939  
##                   religionislam                  religionjudaism  
##                        0.031443                        -0.092761  
##               religionmethodist              religionno_religion  
##                        0.088665                        -0.217920  
##      religionorthodox_christian                    religionother  
##                        0.284934                        -0.105327  
##          religionother_religion              religionpentecostal  
##                        0.346456                         0.303206  
## religionprefer_not_say_religion             religionpresbyterian  
##                        0.166689                        -0.053246  
##         religionunited_reformed             ethnicityasian_other  
##                       -0.378558                         0.653624  
##            ethnicityblack_other               ethnicitycaribbean  
##                        0.642738                         0.261498  
##                ethnicitychinese            ethnicityethnic_other  
##                        0.796351                         0.277655  
##                 ethnicityindian                   ethnicitymixed  
##                        1.159166                         0.555175  
##              ethnicitypakistani   ethnicityprefer_not_say_ethnic  
##                        0.535713                         1.232634  
##            ethnicitywhite_asian           ethnicitywhite_british  
##                        0.200661                         1.229793  
##            ethnicitywhite_irish             ethnicitywhite_other  
##                        0.743317                         0.361737  
## 
## Degrees of Freedom: 34365 Total (i.e. Null);  34320 Residual
## Null Deviance:       94980 
## Residual Deviance: 84560     AIC: 81890
## 
## $actual_proportion
## [1] 0.527318
## 
## $actual_upr
## [1] 0.5345551
## 
## $actual_lwr
## [1] 0.5200868
## 
## $informed_proportion
## [1] 0.4280755
## 
## $informed_upr
## [1] 0.4309813
## 
## $informed_lwr
## [1] 0.4249105
## 
## $difference
## [1] -0.0992425

inf2

## $formula
## immig_self ~ knowledge_binary + age + gender + education + income + 
##     religion + ethnicity + party_id
## <environment: 0x7f815a47d868>
## 
## $model
## 
## Call:  glm(formula = f, family = "binomial", data = data, weights = df[[prop_weight]])
## 
## Coefficients:
##                     (Intercept)                 knowledge_binary  
##                        1.719771                        -0.532837  
##                        age26-35                         age36-45  
##                        0.351095                         0.453277  
##                        age46-55                         age56-65  
##                        0.754227                         0.756428  
##                       ageover65                       gendermale  
##                        0.601637                         0.173919  
##             educationbelow_gcse                    educationgcse  
##                       -0.272653                        -0.295553  
##                educationa-level               educationundergrad  
##                       -0.683545                        -1.069484  
##               educationpostgrad                         incomeQ2  
##                       -1.464866                        -0.145739  
##                        incomeQ3                         incomeQ4  
##                       -0.199472                        -0.290574  
##                        incomeQ5                 religionbuddhism  
##                       -0.468976                        -0.107350  
##                  religionc_of_e                 religioncatholic  
##                        0.275747                         0.091117  
##             religionevangelical                 religionhinduism  
##                       -0.226115                         0.016562  
##                   religionislam                  religionjudaism  
##                        0.147791                        -0.325288  
##               religionmethodist              religionno_religion  
##                        0.114635                        -0.119320  
##      religionorthodox_christian                    religionother  
##                        0.283882                         0.009752  
##          religionother_religion              religionpentecostal  
##                        0.487761                         0.278207  
## religionprefer_not_say_religion             religionpresbyterian  
##                        0.231863                         0.190129  
##         religionunited_reformed             ethnicityasian_other  
##                       -0.375625                         0.510763  
##            ethnicityblack_other               ethnicitycaribbean  
##                        0.425796                         0.474356  
##                ethnicitychinese            ethnicityethnic_other  
##                        0.382862                         0.131692  
##                 ethnicityindian                   ethnicitymixed  
##                        0.877855                         0.457099  
##              ethnicitypakistani   ethnicityprefer_not_say_ethnic  
##                        0.515671                         1.077609  
##            ethnicitywhite_asian           ethnicitywhite_british  
##                        0.075100                         1.038331  
##            ethnicitywhite_irish             ethnicitywhite_other  
##                        0.784199                         0.234440  
##                    party_idcons                    party_idgreen  
##                       -1.393134                        -3.274262  
##                  party_idlabour                   party_idlibdem  
##                       -2.848882                        -3.135289  
##                party_idno_party                    party_idother  
##                       -2.016414                        -2.379292  
##                   party_idplaid                      party_idsnp  
##                       -3.253804                        -3.231791  
##                    party_idukip  
##                       -0.607326  
## 
## Degrees of Freedom: 34365 Total (i.e. Null);  34311 Residual
## Null Deviance:       94980 
## Residual Deviance: 77060     AIC: 74650
## 
## $actual_proportion
## [1] 0.527318
## 
## $actual_upr
## [1] 0.5343647
## 
## $actual_lwr
## [1] 0.5200472
## 
## $informed_proportion
## [1] 0.4421845
## 
## $informed_upr
## [1] 0.445838
## 
## $informed_lwr
## [1] 0.4386526
## 
## $difference
## [1] -0.08513349

inf3

## $formula
## immig_self ~ knowledge_binary + age + gender + education + income + 
##     religion + ethnicity + party_id + eu_ref_vote
## <environment: 0x7f815a4a6c60>
## 
## $model
## 
## Call:  glm(formula = f, family = "binomial", data = data, weights = df[[prop_weight]])
## 
## Coefficients:
##                     (Intercept)                 knowledge_binary  
##                         0.04906                         -0.51763  
##                        age26-35                         age36-45  
##                         0.30096                          0.28764  
##                        age46-55                         age56-65  
##                         0.53811                          0.48605  
##                       ageover65                       gendermale  
##                         0.27991                          0.11576  
##             educationbelow_gcse                    educationgcse  
##                        -0.23043                         -0.23426  
##                educationa-level               educationundergrad  
##                        -0.54913                         -0.83056  
##               educationpostgrad                         incomeQ2  
##                        -1.20457                         -0.13545  
##                        incomeQ3                         incomeQ4  
##                        -0.13063                         -0.21060  
##                        incomeQ5                 religionbuddhism  
##                        -0.30943                         -0.18498  
##                  religionc_of_e                 religioncatholic  
##                         0.22976                          0.06337  
##             religionevangelical                 religionhinduism  
##                        -0.37693                          0.02491  
##                   religionislam                  religionjudaism  
##                         0.22707                         -0.22209  
##               religionmethodist              religionno_religion  
##                         0.09641                         -0.11304  
##      religionorthodox_christian                    religionother  
##                         0.46718                         -0.09422  
##          religionother_religion              religionpentecostal  
##                         0.16261                          0.06495  
## religionprefer_not_say_religion             religionpresbyterian  
##                         0.19455                          0.23901  
##         religionunited_reformed             ethnicityasian_other  
##                        -0.28753                          0.49694  
##            ethnicityblack_other               ethnicitycaribbean  
##                         0.31591                          0.63212  
##                ethnicitychinese            ethnicityethnic_other  
##                         0.54840                          0.05562  
##                 ethnicityindian                   ethnicitymixed  
##                         1.00790                          0.42641  
##              ethnicitypakistani   ethnicityprefer_not_say_ethnic  
##                         0.60471                          1.15017  
##            ethnicitywhite_asian           ethnicitywhite_british  
##                        -0.09531                          0.97678  
##            ethnicitywhite_irish             ethnicitywhite_other  
##                         0.82584                          0.32620  
##                    party_idcons                    party_idgreen  
##                        -0.89646                         -1.92159  
##                  party_idlabour                   party_idlibdem  
##                        -1.60394                         -1.68465  
##                party_idno_party                    party_idother  
##                        -1.17666                         -1.68492  
##                   party_idplaid                      party_idsnp  
##                        -1.99743                         -1.82308  
##                    party_idukip           eu_ref_voteno_vote_ref  
##                        -0.36562                          1.28388  
##               eu_ref_voteremain  
##                         1.89048  
## 
## Degrees of Freedom: 34365 Total (i.e. Null);  34309 Residual
## Null Deviance:       94980 
## Residual Deviance: 68870     AIC: 66670
## 
## $actual_proportion
## [1] 0.527318
## 
## $actual_upr
## [1] 0.5349177
## 
## $actual_lwr
## [1] 0.5200195
## 
## $informed_proportion
## [1] 0.4587296
## 
## $informed_upr
## [1] 0.4628787
## 
## $informed_lwr
## [1] 0.4543385
## 
## $difference
## [1] -0.0685884

plot_df <- tibble(
  scenario = c("Actual", "Informed (demographic)","Informed (partisanship)", "Informed (EU vote)"),
  support = c(inf1$actual_proportion, inf1$informed_proportion, inf2$informed_proportion, inf3$informed_proportion),
  lwr = c(inf1$actual_lwr, inf1$informed_lwr, inf2$informed_lwr, inf3$informed_lwr),
  upr = c(inf1$actual_upr, inf1$informed_upr, inf2$informed_upr, inf3$informed_upr)
)

plot_df <- within(plot_df, scenario <- factor(scenario,
                                                levels=c("Actual", "Informed (EU vote)", "Informed (partisanship)", "Informed (demographic)")))

theme_set(theme_minimal())
library(RColorBrewer)
ggplot(plot_df) +
  aes(x = scenario,
      y = support,
      fill = scenario) +
  geom_bar(stat = "identity", color = "black") +
  geom_errorbar(aes(ymin=lwr,
                    ymax=upr), width=.1,
                position=position_dodge(.9)) +
  geom_hline(yintercept=0.5, linetype="dashed") +
  scale_fill_brewer(palette="Blues") +
  ylab("Proportion of support") +
  xlab("Scenario") +
  ggtitle("Proportion wanting to see immigration levels reduced by scenario") +
  theme(legend.position = "none") +
  annotate("text", x = 1, y = plot_df$support[1]+0.025, label = paste(round(plot_df$support[1] * 100, 2),"%", sep="")) +
  annotate("text", x = 2, y = plot_df$support[4]+0.025, label = paste(round(plot_df$support[4] * 100, 2),"%", sep="")) +
  annotate("text", x = 3, y = plot_df$support[3]+0.025, label = paste(round(plot_df$support[3] * 100, 2),"%", sep="")) +
  annotate("text", x = 4, y = plot_df$support[2]+0.025, label = paste(round(plot_df$support[2] * 100, 2),"%", sep=""))

The function returns the formula, the model, as well as the actual and estimated informed proportions (in each case weighted using the survey weights), as well as the difference between the two, i.e., the information effect. We see that, if people in the UK were all to become informed, in the respect operationalised here, the idea that immigration levels should be reduced would likely go from a majority to a minority position (with the dashed line signifying 50%), irrespective of the particular modeling assumptions made, suggesting some robustness of the results. The size of the effect – close to 10 percentage points for the demographic model – is noteworthy, especially in the context of the low bar set by the knowledge scale, suggesting that even a low level of political knowledge makes a difference.

There is no established way to compute confidence intervals for the type of aggregate, counterfactual estimates reported here. For purposes of giving a sense of the variability of the individual estimates, bootstrapped confidence intervals are therefore returned, constructed as follows: 1,000 bootstrap samples are drawn from the (weighted) predictions, as well as from the weighted responses in the data set. Using R’s boot package (Davison and Hinkley 1997), basic 95% confidence intervals are then computed for the mean weighted prediction, the upper and lower bounds of which are then divided in each case by the mean weight in the total sample.

4 Conclusion

We started out by noting that information matters for politics, and that one of the main ways of determining the difference that knowledge will make in any given instance is by modeling information effects. This step-by-step guide offers a complete pipeline and set of functions for calculating such effects. These functions are written with the ambition that they should be of use to others wishing to model information effects on their own data sets. To that end, if anyone spots any problems or has suggestions for improvements, please contact me on k.ahlstrom-vij@bbk.ac.uk.

References

Achen, Christopher, and Larry M. Bartels. 2016. Democracy for Realists : Why Elections Do Not Produce Responsive Government. Princeton: Princeton University Press.

Ahlstrom-Vij, Kristoffer. 2020. “The Case for Modelled Democracy.” Episteme, 1–22. https://doi.org/10.1017/epi.2020.10.

———. 2021. “Do We Live in a ‘Post-Truth’ Era?” Political Studies. https://doi.org/10.1177/00323217211026427.

Althaus, Scott L. 2003. Collective Preferences in Democratic Politics: Opinion Surveys and the Will of the People. Cambridge University Press. https://doi.org/10.1017/CBO9780511610042.

Andrews, Mark. 2021. Doing Data Science in r: An Introduction for Social Scientists. Sage.

Ayala, R J de. 2009. The Theory and Practice of Item Response Theory. Methodology in the Social Sciences. New York, NY: Guilford Publications.

Barrett, Malcolm. 2022. “Ggdag: Analyze and Create Elegant Directed Acyclic Graphs.” 2022. https://CRAN.R-project.org/package=ggdag.

Bartels, Larry M. 1996. “Uninformed Votes: Information Effects in Presidential Elections.” American Journal of Political Science 40 (1): 194–230. http://www.jstor.org/stable/2111700.

Bhatti, Yosef. 2010. “What Would Happen If We Were Better Informed? Simulating Increased Knowledge in European Parliament (EP) Elections.” Representation 46 (4): 391–410. https://doi.org/10.1080/00344893.2010.518078.

Blais, Andre, Elisabeth Gidengil, Patrick Fournier, and Neil Nevitte. 2009. “Information, Visibility and Elections: Why Electoral Outcomes Differ When Voters Are Better Informed.” European Journal of Political Research 48 (2): 256–80. https://doi.org/https://doi.org/10.1111/j.1475-6765.2008.00835.x.

Brader, Ted, and Joshua A. Tucker. 2018. “Unreflective Partisans? Policy Information and Evaluation in the Development of Partisanship.” Political Psychology 39 (S1): 137–57. https://doi.org/https://doi.org/10.1111/pops.12480.

Campbell, Angus, Philip E Converse, Warren E Miller, and Donald E Stokes. 1980. The American Voter. Chicago, IL: University of Chicago Press.

Chalmers, R. Philip. 2012. “Mirt: A Multidimensional Item Response Theory Package for the r Environment.” Journal of Statistical Software 48 (6): 1–29. https://doi.org/10.18637/jss.v048.i06.

Davison, A. C., and D. V. Hinkley. 1997. “Bootstrap Methods and Their Applications.” Cambridge: Cambridge University Press. http://statwww.epfl.ch/davison/BMA/.

Dawson, Michael C. 1995. Behind the Mule: Race and Class in African-American Politics. Princeton: Princeton University Press.

Delli Carpini, Michael X., and Scott Keeter. 1993. “Measuring Political Knowledge: Putting First Things First.” American Journal of Political Science 37 (4): 1179–1206. http://www.jstor.org/stable/2111549.

———. 1996. What Americans Know about Politics and Why It Matters. New Haven: Yale University Press.

Fieldhouse, J. Green, E. 2019. “Wave 17 of the 2014-2023 British Election Study Internet Panel.” https://www.britishelectionstudy.com/data-object/wave-17-of-the-2014-2023-british-election-study-internet-panel/.

Fowler, Anthony, and Michele Margolis. 2014. “The Political Consequences of Uninformed Voters.” Electoral Studies 34: 100–110. https://doi.org/https://doi.org/10.1016/j.electstud.2013.09.009.

Geoffrey Evans, Ksenia Northmore-Ball. 2020. “Long-Term Factors: Class and Religious Cleavages.” In The Routledge Handbook of Elections, Voting Behavior and Public Opinion, edited by Mark N. Franklin Justin Fisher Edward Fieldhouse. Routledge.

Green, Donald, Bradley Palmquist, and Eric Schickler. 2004. Partisan Hearts and Minds. The Institution for Social and Policy Studies. New Haven, CT: Yale University Press.

Greifer, Noah. 2022. “Cobalt: Covariate Balance Tables and Plots.” https://CRAN.R-project.org/package=cobalt.

Hansen, Kasper M. 2009. “Changing Patterns in the Impact of Information on Party Choice in a Multiparty System.” International Journal of Public Opinion Research 21 (4): 525–46. https://doi.org/10.1093/ijpor/edp042.

Hobolt, Sara B., Thomas J. Leeper, and James Tilley. 2021. “Divided by the Vote: Affective Polarization in the Wake of the Brexit Referendum.” British Journal of Political Science 51 (4): 1476–93. https://doi.org/10.1017/S0007123420000125.

Keele, Luke, Randolph T. Stevenson, and Felix Elwert. 2020. “The Causal Interpretation of Estimated Associations in Regression Models.” Political Science Research and Methods 8 (1): 1–13. https://doi.org/10.1017/psrm.2019.31.

Levy, Paul, Lemeshow Stanley, Biemer Paul, and Sharon Christ. 2008. “Constructing Survey Weights.” In Sampling of Populations: Methods and Applications. Wiley.

Morgan, Stephen, and Christoper Winship. 2015. Counterfactuals and Causal Inference : Methods and Principles for Social Research. Cambridge University Press.

Oscarsson, Henrik. 2007. “A Matter of Fact? Knowledge Effects on the Vote in Swedish General Elections, 1985–2002.” Scandinavian Political Studies 30 (3): 301–22. https://doi.org/https://doi.org/10.1111/j.1467-9477.2007.00182.x.

Plutzer, Eric. 2020. “Demographics and the Social Bases of Voter Turnout.” In The Routledge Handbook of Elections, Voting Behavior and Public Opinion, edited by Mark N. Franklin Justin Fisher Edward Fieldhouse. Routledge.

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Rasmussen, Stig Hebbelstrup Rye. 2016. “Education or Personality Traits and Intelligence as Determinants of Political Knowledge?” Political Studies 64 (4): 1036–54. https://doi.org/10.1111/1467-9248.12214.

Rohrer, Julia M. 2018. “Thinking Clearly about Correlations and Causation: Graphical Causal Models for Observational Data.” Advances in Methods and Practices in Psychological Science 1 (1): 27–42. https://doi.org/10.1177/2515245917745629.

vanHeerde-Hudson, Jennifer. 2020. “Political Knowledge: Measurement, Misinformation and Turnout.” In The Routledge Handbook of Elections, Voting Behavior and Public Opinion, edited by Mark N. Franklin Justin Fisher Edward Fieldhouse. Routledge.

Vowles, Jack. 2020. “The Big Picture: Turnout at the Macro-Level.” In The Routledge Handbook of Elections, Voting Behavior and Public Opinion, edited by Mark N. Franklin Justin Fisher Edward Fieldhouse. Routledge.

Yen, Wendy M. 1993. “Scaling Performance Assessments: Strategies for Managing Local Item Dependence.” Journal of Educational Measurement 30 (3): 187–213. https://doi.org/https://doi.org/10.1111/j.1745-3984.1993.tb00423.x.

Zaller, John. 1992. The Nature and Origins of Mass Opinion. Cambridge England New York, NY, USA: Cambridge University Press.

Modeling information effects in R:
A step-by-step guide

Kristoffer Ahlstrom-Vij

1 Why information matters in politics

2 The data set

3 How to model information effects in R

3.1 Constructing a knowledge scale using IRT

3.2 Evaluating construct validity

3.3 Calculating propensity scores

3.4 Evaluating propensity scores using balance plots

3.5 Drawing up a DAG

3.6 Calculating information effects

4 Conclusion

References

Modeling information effects in R: A step-by-step guide

Kristoffer Ahlstrom-Vij

1 Why information matters in politics

2 The data set

3 How to model information effects in R

3.1 Constructing a knowledge scale using IRT

3.2 Evaluating construct validity

3.3 Calculating propensity scores

3.4 Evaluating propensity scores using balance plots

3.5 Drawing up a DAG

3.6 Calculating information effects

4 Conclusion

References

Modeling information effects in R:
A step-by-step guide