afana_xena_homework

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.

plot(cars)

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

# List of packages
packages <- c("tidyverse", "infer", "fst", "modelsummary", "effects", "survey", "interactions") # add any you need here

# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load the packages
lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: carData
## 
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
## 
## Loading required package: grid
## 
## Loading required package: Matrix
## 
## 
## Attaching package: 'Matrix'
## 
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 
## Loading required package: survival
## 
## 
## Attaching package: 'survey'
## 
## 
## The following object is masked from 'package:graphics':
## 
##     dotchart

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "infer"     "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "fst"       "infer"     "lubridate" "forcats"   "stringr"   "dplyr"    
##  [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
## [13] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [19] "base"     
## 
## [[4]]
##  [1] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
##  [6] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [11] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [16] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[5]]
##  [1] "effects"      "carData"      "modelsummary" "fst"          "infer"       
##  [6] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [11] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [16] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [21] "methods"      "base"        
## 
## [[6]]
##  [1] "survey"       "survival"     "Matrix"       "grid"         "effects"     
##  [6] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [11] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [16] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [21] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [26] "base"        
## 
## [[7]]
##  [1] "interactions" "survey"       "survival"     "Matrix"       "grid"        
##  [6] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [11] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [16] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [21] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [26] "methods"      "base"

End of Tutorial

Homework 4 (5%): due by next lecture on March 19th

Instructions: Start a new R markdown for the homework and call it “Yourlastname_Firstname_Homework_4”.

Copy everything below from Task 1 to Task 5. Keep the task prompt and questions, and provide your code and answer underneath.

Remember: you need all the steps for your code to work, including loading your data – otherwise it will not knit.

To generate a new code box, click on the +C sign above. Underneath your code, provide your answer to the task question.

When you are done, click on “Knit” above, then “Knit to Html”. Wait for everything to compile. If you get an error like “Execution halted”, it means there are issues with your code you must fix. When all issues are fixed, it will prompt a new window. Then click on “Publish” in the top right, and then Rpubs (the first option) and follow the instructions to create your Rpubs account and get your Rpubs link for your document (i.e., html link as I provide for the tutorial).

Note: Make sure to provide both your markdown file and R pubs link. If you do not submit both, you will be penalized 2 pts. out of the 5 pts. total.

Task 1

Graph the histogram distribution using the populist scale (as we did in the tutorial) for the country of Sweden. What do you note?

sweden_data <- read.fst("sweden_data.fst")

sweden_data$trstplt <- ifelse(sweden_data$trstplt > 10, NA, sweden_data$trstplt)
sweden_data$trstprl <- ifelse(sweden_data$trstprl > 10, NA, sweden_data$trstprl)
sweden_data$trstprt <- ifelse(sweden_data$trstprt > 10, NA, sweden_data$trstprt)


sweden_data$trust <- scales::rescale(sweden_data$trstplt + sweden_data$trstprl + sweden_data$trstprt, na.rm = TRUE, to = c(0, 100))


sweden_data$populist <- scales::rescale(sweden_data$trust, na.rm = TRUE, to=c(100,0))

ggplot(sweden_data, aes(x = populist)) +
  geom_histogram(bins = 30, fill = "forestgreen", color = "hotpink") +
  theme_minimal() +
  labs(title = "Distribution of Populist Scale for Sweden",
       x = "Populist Scale",
       y = "Count")

## Warning: Removed 2503 rows containing non-finite values (`stat_bin()`).

## According to the output interpretation for Sweden, the country represents a center-heavy populist, in contrast to the examples of Spain and Hungary, which represent right- and left-wing populism, respectively.

Task 2

Run a linear regression with populist attitudes as the outcome, educational attainment (recoded as a binary “BA or more” and “No BA”), and using the Swedish data. Print the intercept and coefficients only and interpret. Then do a tidy model table displaying the p-value (with digits = 3) and interpret.

model_data <- sweden_data %>%
  mutate(
    gndr = case_when(
      gndr == 1 ~ "Male",
      gndr == 2 ~ "Female",
      gndr == 9 ~ NA_character_,
      TRUE ~ as.character(gndr)
    ),

    educ.ba = case_when(
      essround < 5 & edulvla == 5 ~ "BA or more",
      essround >= 5 & edulvlb > 600 ~ "BA or more",
      TRUE ~ "No BA"
    ),
    
    edulvla = ifelse(edulvla %in% c(77, 88, 99), NA_integer_, edulvla),
    edulvlb = ifelse(edulvlb %in% c(5555, 7777, 8888), NA_integer_, edulvlb),
    
    educ.ba = factor(educ.ba, levels = c("No BA", "BA or more")),

    age = ifelse(agea == 999, NA, agea),

    cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),

    gen = case_when(
      yrbrn %in% 1900:1945 ~ "1",
      yrbrn %in% 1946:1964 ~ "2",
      yrbrn %in% 1965:1979 ~ "3",
      yrbrn %in% 1980:1996 ~ "4",
      TRUE ~ as.character(cohort)  
    ),

    gen = factor(gen,
                 levels = c("1", "2", "3", "4"),
                 labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials"))
  )
table(model_data$gen)

## 
##     Interwar Baby Boomers        Gen X  Millennials 
##         4021         5787         4263         3609

sweden_model <- lm(populist ~ educ.ba, data = model_data)

broom::tidy(sweden_model)

## Registered S3 methods overwritten by 'broom':
##   method            from  
##   tidy.glht         jtools
##   tidy.summary.glht jtools

broom::tidy(sweden_model) |>
knitr::kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	51.359	0.181	283.363	0
educ.baBA or more	-7.868	0.350	-22.494	0

Task 3

df <- read.fst("italy_data.fst")
## Creates an authoritarian values scale based on human modules items (can also do libertarian value scale)
df <- df %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
  # Apply the reverse coding
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

# Now you can calculate 'schwartzauth' after the NA recoding
df$auth <- scales::rescale(df$behave + 
                      df$secure + 
                      df$safety + 
                      df$tradition + 
                      df$rules, to=c(0,100), na.rm=TRUE)


df <- df %>% filter(!is.na(auth))
table(df$secure)

## 
##    1    2    3    4    5    6 
##   54  139  626 1856 2876 2881

table(df$auth)

## 
##   0  12  16  20  24  28  32  36  40  44  48  52  56  60  64  68  72  76  80  84 
##   7   4   4   9  13  20  27  57 113 155 163 323 367 596 647 789 823 897 992 733 
##  88  92  96 100 
## 632 465 299 297

ggplot(df, aes(x = auth)) +
  geom_histogram(bins = 30, fill = "lightblue", color = "lavender") +
  theme_minimal() +
  labs(title = "Distribution of Authoritarian Scale for Italy",
       x = "Authoritarian Scale",
       y = "Count")

## The distribution of the authoritarian scale for Italy can be observed to be leaning more to the right as opposed to the Portugal example, where the graph is more centered.

df$year <- NA
replacements <- c(2002, 2004, 2006, 2008, 2010, 2012, 2014, 2016, 2018, 2020)
for(i in 1:10){
  df$year[df$essround == i] <- replacements[i]
}
auth_avg <- df %>%
  group_by(year) %>%
  summarize(auth_avg = mean(auth, na.rm = TRUE))

plot_auth <- ggplot(auth_avg, aes(x = year, y = auth_avg)) +
  geom_point(aes(color = auth_avg), alpha = 1.0) + 
  geom_line(aes(group = 1), color = "orange", linetype = "dashed") +  
  labs(title = "Authoritarian Scale Average by Year (Italy)",
       x = "Survey Year",
       y = "Authoritarian Scale Average") +
  theme_minimal() +
  theme(legend.position = "none") +
  scale_y_continuous(limits = c(0, 100))

print(plot_auth)

Little to no fluctuation can be observed for Italy’s authoritarian scale.

Task 4

model_data <- df %>%
  mutate(
    # Recoding gender
    gndr = case_when(
      gndr == 1 ~ "Male",
      gndr == 2 ~ "Female",
      gndr == 9 ~ NA_character_,
      TRUE ~ as.character(gndr)
    ),

 # Recoding education
    educ.ba = case_when(
      essround < 5 & edulvla == 5 ~ "BA or more",
      essround >= 5 & edulvlb > 600 ~ "BA or more",
      TRUE ~ "No BA"
    ),
    
    # Handle NAs for education levels
    edulvla = ifelse(edulvla %in% c(77, 88, 99), NA_integer_, edulvla),
    edulvlb = ifelse(edulvlb %in% c(5555, 7777, 8888), NA_integer_, edulvlb),
    
    # Explicitly making 'No BA' the reference category
    educ.ba = factor(educ.ba, levels = c("No BA", "BA or more")),

    # Recoding age, setting 999 to NA
    age = ifelse(agea == 999, NA, agea),

    # Recoding cohort variable, setting years before 1930 and after 2000 to NA
    cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),

    # Recoding generational cohorts based on year of birth
    gen = case_when(
      yrbrn %in% 1900:1945 ~ "1",
      yrbrn %in% 1946:1964 ~ "2",
      yrbrn %in% 1965:1979 ~ "3",
      yrbrn %in% 1980:1996 ~ "4",
      TRUE ~ as.character(cohort)  # Keeping other values as character if they do not fit the ranges
    ),

    # Converting cohort to a factor with labels
    gen = factor(gen,
                 levels = c("1", "2", "3", "4"),
                 labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials"))
  )
table(model_data$gen)

## 
##     Interwar Baby Boomers        Gen X  Millennials 
##         1026         2580         2214         1811

model1 <- lm(auth ~ gen, data = model_data)
model2 <- lm(auth ~ gndr, data = model_data)
model3 <- lm(auth ~ educ.ba, data = model_data)

modelsummary(
  list(model1),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_omit = "Intercept")

	(1)
genBaby Boomers	−1.1 (0.6)*
genGen X	−2.1 (0.6)***
genMillennials	−3.6 (0.6)***
Num.Obs.	7631
R2	0.006
R2 Adj.	0.006
AIC	62934.0
BIC	62968.7
Log.Lik.	−31462.009
RMSE	14.94

genBaby Boomers (-1.1, p<0.001): Compared to the Interwar generation (reference category that is omitted), Baby Boomers exhibit scores that are, on average, 1.1 points lower on the authoritarian values scale (0-100). The SINGLE asterisks (*) denote statistical significance at a higher p-value (above our alpha threshold of 0.05)

genGen X (-2.1, p<0.001): Generation X showcases a slight decrease, with a score of 2.1 points lower than the Interwar generation on the authoritarian values scale.The triple asterisks (***) denote statistical significance at a very low p-value (under our alpha threshold of 0.05)

genMillennials (-3.6, p<0.001): Millennials exhibit the most considerable decrease in authoritarian values, with their scores being 3.6 points lower than those of the Interwar generation. This marked difference is statistically significant, indicating that Millennials diverge most strongly from the Interwar generation in comparing their authoritarian values.

Task 5

df <- read.fst("greece_data.fst")

model_data <- df %>%
  mutate(

    gndr = case_when(
      gndr == 1 ~ "Male",
      gndr == 2 ~ "Female",
      gndr == 9 ~ NA_character_,
      TRUE ~ as.character(gndr)
    ),
    
    ID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                  
      lrscale %in% 4:6 ~ "Moderate",                
      lrscale %in% c(77, 88, 99) ~ NA_character_    

    ),
    age = ifelse(agea == 999, NA, agea),

    cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),

    gen = case_when(
      yrbrn %in% 1900:1945 ~ "1",
      yrbrn %in% 1946:1964 ~ "2",
      yrbrn %in% 1965:1979 ~ "3",
      yrbrn %in% 1980:1996 ~ "4",
      TRUE ~ as.character(cohort)  
    ),

    gen = factor(gen,
                 levels = c("1", "2", "3", "4"),
                 labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials"))
  )
table(model_data$gen)

## 
##     Interwar Baby Boomers        Gen X  Millennials 
##         2978         3484         3498         2330

# Populist scale (Norris/Inglehart use low levels of trust in parliaments, politicians and parties as an indicator of populism)

# Note: Schafer does it 2 ways, we will focus now on the most straightforward

# Setting values greater than 10 to NA
df$trstplt <- ifelse(df$trstplt > 10, NA, df$trstplt)
df$trstprl <- ifelse(df$trstprl > 10, NA, df$trstprl)
df$trstprt <- ifelse(df$trstprt > 10, NA, df$trstprt)

# Creating and rescaling the trust variable
df$trust <- scales::rescale(df$trstplt + df$trstprl + df$trstprt, na.rm = TRUE, to = c(0, 100))

# Here's how you would "flip it" because populist is conceived by N + I as 'anti-trust'
# As Schafer explains there are some issues with that conceptualization, but N + I is also widely applied
populist <- scales::rescale(df$trust, na.rm = TRUE, to=c(100,0))

models <- list(
   "Model 1" = lm(populist ~ gen, data = model_data),
   "Model 2" = lm(populist ~ gndr, data = model_data),
   "Model 3" = lm(populist ~ lrscale, data = model_data))
modelsummary(models, fmt = 1,
  estimate  = c("{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL)

	Model 1	Model 2	Model 3
(Intercept)	65.8 (0.5)***	70.0 (0.3)***	67.5 (0.3)***
genBaby Boomers	3.5 (0.7)***
genGen X	5.6 (0.7)***
genMillennials	5.8 (0.7)***
gndrMale		−1.0 (0.5)*
lrscale			0.1 (0.0)***
Num.Obs.	9562	9823	9823
R2	0.009	0.000	0.021
R2 Adj.	0.009	0.000	0.021
AIC	86473.9	88873.2	88667.6
BIC	86509.7	88894.8	88689.2
Log.Lik.	−43231.944	−44433.590	−44330.802
RMSE	22.25	22.30	22.07

genBaby Boomers (-3.5, p<0.001): Compared to the Interwar generation (reference category that is omitted), Baby Boomers exhibit scores that are, on average, 3,5 points lower on the lrscale values. The SINGLE asterisks (*) denote statistical significance at a higher p-value (above our alpha threshold of 0.05)

genGen X (-5.6, p<0.001): Generation X showcases a slight decrease, with a score of 5.6 points lower than the Interwar generation on the lrscale values.The triple asterisks (***) denote statistical significance at a very low p-value (under our alpha threshold of 0.05)

genMillennials (-5.8, p<0.001): Millennials exhibit the most considerable decrease in lrscale values, with their scores being 5.8 points lower than those of the Interwar generation. This marked difference is statistically significant, indicating that Millennials diverge most strongly from the Interwar generation in comparing their lrscale values.

models <- list(
   "Model 1" = lm(populist ~ ID, data = model_data),  # Assuming 'left_right' is defined
   "Model 2" = lm(populist ~ gndr, data = model_data),
   "Model 3" = lm(populist ~ gen, data = model_data))
modelsummary(models, fmt = 1,
  estimate  = c("{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL )

	Model 1	Model 2	Model 3
(Intercept)	72.4 (0.6)***	70.0 (0.3)***	65.8 (0.5)***
IDModerate	−3.0 (0.7)***
IDRight	−10.5 (0.8)***
gndrMale		−1.0 (0.5)*
genBaby Boomers			3.5 (0.7)***
genGen X			5.6 (0.7)***
genMillennials			5.8 (0.7)***
Num.Obs.	8012	9823	9562
R2	0.029	0.000	0.009
R2 Adj.	0.029	0.000	0.009
AIC	72094.6	88873.2	86473.9
BIC	72122.5	88894.8	86509.7
Log.Lik.	−36043.288	−44433.590	−43231.944
RMSE	21.75	22.30	22.25

modelplot(models, coef_omit = 'Interc')

## The younger generations and left-leaning individuals are observed to have a high-populism,

modelsummary(models, fmt = 1,
  estimate  = c("{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_rename = c("genBaby Boomers" = "Baby Boomers", "genGen X" = "Gen X", "genMillennials" = "Millennials", "gndrMale" = "Male", "educ.baBA or more" = "BA or More"),
  title = 'Table 2. Regression models for authoritarian attitudes')

Table 2. Regression models for authoritarian attitudes
	Model 1	Model 2	Model 3
(Intercept)	72.4 (0.6)***	70.0 (0.3)***	65.8 (0.5)***
IDModerate	−3.0 (0.7)***
IDRight	−10.5 (0.8)***
Male		−1.0 (0.5)*
Baby Boomers			3.5 (0.7)***
Gen X			5.6 (0.7)***
Millennials			5.8 (0.7)***
Num.Obs.	8012	9823	9562
R2	0.029	0.000	0.009
R2 Adj.	0.029	0.000	0.009
AIC	72094.6	88873.2	86473.9
BIC	72122.5	88894.8	86509.7
Log.Lik.	−36043.288	−44433.590	−43231.944
RMSE	21.75	22.30	22.25

afana_xena_homework_4