Gouge Lab, July 24, 2024

Today’s lab is going to cover a bit of everything– hooray! We will begin by looking at the electoral confidence battery from the World Values Survey.

First, let’s load the appropriate libraries and the data!

#libraries 
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(magrittr)

## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

# Data 
t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/wvs_wave_7.rds" %>% 
  url %>% 
  readRDS

It would also be helpful to have a codebook, so let’s make one of those.

#making a codebook
m1 <- t1 %>% 
  map( ~ attr(., "label")) %>% 
  unlist

## codebook 
cb <- tibble(
  itm = m1 %>% 
    names, 
  labs = m1 %>% unlist
)


#looking at the data
glimpse(t1)

We are interested in the battery on electoral confidence. The electoral confidence items are Q224-Q233. We can use our codebook to take a look at the questions included in this battery.

cb %>% 
  filter(
    itm %>% 
      is_in(
        str_c(
          "Q", 224:233
        )
      )
  ) %>% 
  pluck("labs")

##                                                                               Q224 
##                       "How often in country's elections: Votes are counted fairly" 
##                                                                               Q225 
## "How often in country's elections: Opposition candidates are prevented from runni" 
##                                                                               Q226 
##             "How often in country's elections: TV news favors the governing party" 
##                                                                               Q227 
##                              "How often in country's elections: Voters are bribed" 
##                                                                               Q228 
## "How often in country's elections: Journalists provide fair coverage of elections" 
##                                                                               Q229 
##                    "How often in country's elections: Election officials are fair" 
##                                                                               Q230 
##                      "How often in country's elections: Rich people buy elections" 
##                                                                               Q231 
## "How often in country's elections: Voters are threatened with  violence at the po" 
##                                                                               Q232 
## "How often in country's elections: Voters are offered a genuine choice in the ele" 
##                                                                               Q233 
## "How often in country's elections: Women have equal opportunities to run the offi"

Great, it looks like there is a variety of questions that ask respondents to consider “How often in country’s election” a variety of things occur. For example, the first item asks “how often in one’s country are votes counted fairly”.

We want to compare results by country income level. For instance, we want to compare responses to this battery from among respondents from high income countries.

We are going to need a few variables to do this analysis. First, we need to find the variable name that has a country label for each country.

#we need country name
cb %>% 
  filter(itm %>% 
           str_detect(fixed("country", ignore_case = T))) %>% 
  pluck("labs")

##                         B_COUNTRY                   B_COUNTRY_ALPHA 
## "ISO 3166-1 numeric country code" "ISO 3166-1 alpha-3 country code"

#we also need some income variable, is there one present in the dataset?

#looking for an income variable
cb %>% filter(
  itm %>% 
    str_detect("income")
  ) %>% pluck("labs")

##                                  incomeWB 
## "Income group country [World Bank, 2019]" 
##                            incomeindexHDI 
##      "Income Index (0 to 1) [UNDP, 2018]"

#we can't forget weights!
cb %>% filter(
  itm %>% 
    str_detect(fixed("weight", ignore_case = T))
  ) %>% pluck("labs")

## W_WEIGHT 
## "Weight"

Looks like we will need B_COUNTRY and incomeWB, W_WEIGHT along with our battery items. Now we can select out the items that we need for this analysis and make a new data frame.

t2 <- t1 %>% 
  select(
    B_COUNTRY, 
    incomeWB,
    W_WEIGHT,
    Q224:Q233
  ) %>% 
  na.omit

t2 is a a more manageable data frame, so now we can take a look at the structure of the variables and see if we need to recode anything!

#What are the levels of income variable?
t2$incomeWB %>% levels %>% dput

## c("Low income", "Lower middle income", "Upper middle income", 
## "High income")

We need to recode the income variable so that it is “low income”, “middle income”, and “high income”. Let’s do so below:

#We need to recode the World Bank income variable to reflect low, middle, and high income levels

t2 %<>% 
  mutate(
    income_level = case_when(
      incomeWB == "Low income" ~ "Low income",
      incomeWB %in% c("Lower middle income", "Upper middle income") ~ "Middle income",
      incomeWB == "High income" ~ "High income",
      TRUE ~ NA_character_  # Handle any unexpected values or NAs
      )
      ) 

#did it work?
unique(t2$income_level)

## [1] "High income"   "Middle income" "Low income"

We can see that we have successfully recoded the WB income levels to reflect three income categories.

t2

#looks like we need to pivot the data!


# pivot longer for the response battery

t2_long <- t2 %>% 
  pivot_longer(
    cols = Q224:Q233,
    names_to = "question", 
    values_to = "response"
  ) 


# change responses to numeric
t2_long <- t2_long %>%
  mutate(response_num = case_when(
    response == "Not at all often" ~ 1,
    response == "Not often" ~ 2,
    response == "Fairly often" ~ 3,
    response == "Very often" ~ 4,
    response %in% c("Other missing; Multiple answers Mail (EVS)", "Not asked", "No answer", "Don't know") ~ NA_real_,
    TRUE ~ NA_real_  
  )
  )

Importantly, we have recoded the response variables so that lower numbers reflect LESS confidence.

Let’s check the number of unique responses

unique(t2_long$response_num)

## [1]  2  1  3  4 NA

Now we need to do some math… It might be helpful to look at some descriptive statistics of the data. For example, we can look at the average response by income level to EACH question on the battery.

#let's calculate the mean response by income level and EACH question on the battery
summary_data <- t2_long %>%
  group_by(income_level, question) %>%
  summarize(avg_score = mean(response_num, na.rm = TRUE), .groups = 'drop')

summary_data

## # A tibble: 30 × 3
##    income_level question avg_score
##    <chr>        <chr>        <dbl>
##  1 High income  Q224          3.30
##  2 High income  Q225          1.92
##  3 High income  Q226          2.55
##  4 High income  Q227          2.12
##  5 High income  Q228          2.67
##  6 High income  Q229          3.04
##  7 High income  Q230          2.30
##  8 High income  Q231          1.56
##  9 High income  Q232          3.11
## 10 High income  Q233          3.22
## # ℹ 20 more rows

Or we could look at the average score for each COUNTRY in each income level

#now let's calculate the average score for EACH country in each income level
summary_by_country <- t2_long %>%
  group_by(B_COUNTRY, income_level) %>%
  summarize(average_score = mean(response_num, na.rm = TRUE), .groups = 'drop')

summary_by_country

## # A tibble: 65 × 3
##    B_COUNTRY  income_level  average_score
##    <fct>      <chr>                 <dbl>
##  1 Andorra    High income            2.63
##  2 Argentina  Middle income          2.72
##  3 Australia  High income            2.54
##  4 Bangladesh Middle income          2.57
##  5 Armenia    Middle income          2.53
##  6 Bolivia    Middle income          2.68
##  7 Brazil     Middle income          2.73
##  8 Myanmar    Middle income          2.24
##  9 Canada     High income            2.58
## 10 Chile      High income            2.56
## # ℹ 55 more rows

Finally, we are probably most interested in the mean score of electoral confidence by income level.

##now let's just look at the average score for each income levels, leaving out specific country

summary_by_income <- t2_long %>% 
  group_by(income_level) %>%
  summarize(average_score = mean(response_num, na.rm = TRUE), .groups = 'drop')

summary_by_income

## # A tibble: 3 × 2
##   income_level  average_score
##   <chr>                 <dbl>
## 1 High income            2.59
## 2 Low income             2.45
## 3 Middle income          2.65

Let’s make some figures!

First, we can compare high income countries.

# HIGH INCOME COUNTRIES######## -------------------------------

#filter out only high income countries
high_inc <- t2_long %>% 
  filter(income_level %>% 
           str_detect("High income")) %>% 
  group_by(B_COUNTRY) %>% 
  summarise(
    avg_score = mean(response_num,
                     na.rm = TRUE)
  )
  


ggplot(
  high_inc, aes(
    x = avg_score,
    y = reorder(B_COUNTRY, avg_score)
  )
) + 
  geom_point() +
  labs(x = "Confidence in Elections", 
       y = "Country",
       title = "Electoral Confidence among High Income Countries", 
       subtitle = "Higher scores denote more confidence in elections",
       caption = "Data from the World Values Survey") + 
  theme_minimal()

#let's look at the average confidence in elections among high income countries
ggplot(
  high_inc, aes(
    x = avg_score,
    y = reorder(B_COUNTRY, avg_score)
  )
) + 
  geom_point() +
  labs(x = "Confidence in Elections", 
       y = "Country",
       title = "Electoral Confidence among High Income Countries", 
       subtitle = "Higher scores denote more confidence in elections",
       caption = "Data from the World Values Survey") + 
  theme_minimal()

#high income countries responses for each item 
t2_long %>% 
  filter(income_level %>% 
           str_detect("High income")) %>% 
  group_by(B_COUNTRY, question) %>% 
  summarise(
    avg_score = mean(response_num,
                     na.rm = TRUE)
  ) %>% ggplot(
    aes(
    x = avg_score,
    y = reorder(B_COUNTRY, avg_score)
  )
) + 
  facet_wrap(~ question, 
             labeller = labeller(question = c(
               "Q224" = "Votes are counted fairly",
               "Q225" = "Opposition candidates are\n prevented from running",
               "Q226" = "TV news favors the\n governing party",
               "Q227" = "Voters are bribed",
               "Q228" = "Journalists provide fair \ncoverage of elections",
               "Q229" = "Election officials are fair",
               "Q230" = "Rich people buy elections",
               "Q231" = "Voters are threatened with\n  violence at the polls",
               "Q232" = "Voters are offered a genuine\n choice in the elections",
               "Q233" = "Women have equal opportunities\n to run the office"
             ))) +
  geom_point() +
  labs(x = "Average confidence score", 
       y = "",
       title = "Regarding elections in [YOUR COUNTRY], how often do you think that...", 
       subtitle = "Higher scores denote more confidence",
       caption = "Data from the World Values Survey") + 
  theme_minimal() +
  theme(plot.title = element_text(face = "italic"))

## `summarise()` has grouped output by 'B_COUNTRY'. You can override using the
## `.groups` argument.

What about middle income countries?

# Middle Income Countries -------------------------------------------------

#filter out only middle income countries
mid_inc <- t2_long %>% 
  filter(income_level %>% 
           str_detect("Middle income")) %>% 
  group_by(B_COUNTRY) %>% 
  summarise(
    avg_score = mean(response_num,
                     na.rm = TRUE)
  )


ggplot(
  mid_inc, aes(
    x = avg_score,
    y = reorder(B_COUNTRY, avg_score)
  )
) + 
  geom_point() +
  labs(x = "Confidence in Elections", 
       y = "Country",
       title = "Electoral Confidence among Middle Income Countries", 
       subtitle = "Lower scores denote less confidence in elections",
       caption = "Data from the World Values Survey") + 
  theme_minimal()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Finally, what about low income countries?

# Low Income Countries ----------------------------------------------------



#filter out only low income countries
low_inc <- t2_long %>% 
  filter(income_level %>% 
           str_detect("Low income")) %>% 
  group_by(B_COUNTRY) %>% 
  summarise(
    avg_score = mean(response_num,
                     na.rm = TRUE)
  )


ggplot(
  low_inc, aes(
    x = avg_score,
    y = reorder(B_COUNTRY, avg_score)
  )
) + 
  geom_point() +
  labs(x = "Confidence in Elections", 
       y = "Country",
       title = "Electoral Confidence among Low Income Countries", 
       subtitle = "Lower scores denote less confidence in elections",
       caption = "Data from the World Values Survey") + 
  theme_minimal()

We could also compare income levels to one another…

ggplot(summary_by_country, aes(x = average_score, y = factor(income_level, levels = c("Low income", "Middle income", "High income")))) +
  geom_point() +
  labs(x = "Electoral Confidence", 
       y = "Country Income Level", 
       title = "Electoral Confidence by Country Income Level") +
  theme_minimal()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

As an Americanist, what if I want to look at responses among the US sample?

# What if we wanna look at the US --------

#average score from US respondents for each item in the battery
usa_data <- t2_long %>% 
  filter(B_COUNTRY %>% 
           equals("United States")) %>% 
  group_by(B_COUNTRY, question) %>% 
  summarise(us_avg_score = mean(response_num,
                             na.rm = TRUE))

## `summarise()` has grouped output by 'B_COUNTRY'. You can override using the
## `.groups` argument.

#Let's plot the average score for each item from US respondents 


ggplot(
  usa_data, aes(
    x = us_avg_score,
    y = B_COUNTRY,
    label = round(us_avg_score, 2)
  )) + facet_wrap(~ question, 
                  labeller = labeller(question = c(
                    "Q224" = "Votes are counted fairly",
                    "Q225" = "Opposition candidates are\n prevented from running",
                    "Q226" = "TV news favors the\n governing party",
                    "Q227" = "Voters are bribed",
                    "Q228" = "Journalists provide fair \ncoverage of elections",
                    "Q229" = "Election officials are fair",
                    "Q230" = "Rich people buy elections",
                    "Q231" = "Voters are threatened with\n  violence at the polls",
                    "Q232" = "Voters are offered a genuine\n choice in the elections",
                    "Q233" = "Women have equal opportunities\n to run the office"
                  ))) +
  geom_point(shape = 1, size = 10) +
  geom_text(aes(label = round(us_avg_score, 2)), 
            color = "black", size = 3.5,  # Adjust text size
            fontface = "bold", family = "sans",  # Text appearance
            shape = 19, fill = "white",  # Shape 19 (filled circle)
            hjust = 0.5, vjust = 0.5) + 
  labs(x = "Not At All Often (1) to Fairly Often (4)", 
       y = "",
       title = "Regarding elections in the United States, how often do you think that...",
       caption = "Data from the World Values Survey") +
  theme_bw() +
  theme(
    plot.title = element_text(face = "italic", size = 15),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()) +
  xlim(0, 5) +
  scale_y_discrete(labels = function(x) ifelse(x == "United States", "", x))

## Warning in geom_text(aes(label = round(us_avg_score, 2)), color = "black", :
## Ignoring unknown parameters: `shape` and `fill`

Admittedly, this is not the best graph in the world, but maybe we can improve it together?? This graph does shed some interesting insight into the American public’s confidence in elections and electoral processes.

Let’s do the following exercises together:

It is very likely that the responses to the electoral battery question in the United States differ between Republicans and Democrats. Please create a figure that compares responses from Republicans and Democrats.
Additionally, confidence in electoral systems may vary across different education levels. Compare responses to the electoral confidence battery across different education levels (e.g., high school, Bachelor’s, Advanced degrees) using only responses from US respondents.

Gouge Lab, July 24, 2024

Katie Gouge

2024-07-23