Set-Up

#Installing & Applying Packages
packages <-c("tidyverse", "infer", "fst", "modelsummary", "effects", "survey", "MASS", "aod", "interactions", "kableExtra", "flextable", "scales")
new_packages <-packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
lapply(packages, library, character.only = TRUE)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Version 2.0.0 of `modelsummary`, to be released soon, will introduce a
##   breaking change: The default table-drawing package will be `tinytable`
##   instead of `kableExtra`. All currently supported table-drawing packages
##   will continue to be supported for the foreseeable future, including
##   `kableExtra`, `gt`, `huxtable`, `flextable, and `DT`.
##   
##   You can always call the `config_modelsummary()` function to change the
##   default table-drawing package in persistent fashion. To try `tinytable`
##   now:
##   
##   config_modelsummary(factory_default = 'tinytable')
##   
##   To set the default back to `kableExtra`:
##   
##   config_modelsummary(factory_default = 'kableExtra')
## 
## Loading required package: carData
## 
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
## 
## Loading required package: grid
## 
## Loading required package: Matrix
## 
## 
## Attaching package: 'Matrix'
## 
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 
## Loading required package: survival
## 
## 
## Attaching package: 'survey'
## 
## 
## The following object is masked from 'package:graphics':
## 
##     dotchart
## 
## 
## 
## Attaching package: 'MASS'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## 
## 
## Attaching package: 'aod'
## 
## 
## The following object is masked from 'package:survival':
## 
##     rats
## 
## 
## 
## Attaching package: 'kableExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## 
## 
## 
## Attaching package: 'flextable'
## 
## 
## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote
## 
## 
## The following object is masked from 'package:purrr':
## 
##     compose
## 
## 
## 
## Attaching package: 'scales'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "infer"     "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "fst"       "infer"     "lubridate" "forcats"   "stringr"   "dplyr"    
##  [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
## [13] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [19] "base"     
## 
## [[4]]
##  [1] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
##  [6] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [11] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [16] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[5]]
##  [1] "effects"      "carData"      "modelsummary" "fst"          "infer"       
##  [6] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [11] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [16] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [21] "methods"      "base"        
## 
## [[6]]
##  [1] "survey"       "survival"     "Matrix"       "grid"         "effects"     
##  [6] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [11] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [16] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [21] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [26] "base"        
## 
## [[7]]
##  [1] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
##  [6] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [11] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [16] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [21] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [26] "methods"      "base"        
## 
## [[8]]
##  [1] "aod"          "MASS"         "survey"       "survival"     "Matrix"      
##  [6] "grid"         "effects"      "carData"      "modelsummary" "fst"         
## [11] "infer"        "lubridate"    "forcats"      "stringr"      "dplyr"       
## [16] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [21] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [26] "datasets"     "methods"      "base"        
## 
## [[9]]
##  [1] "interactions" "aod"          "MASS"         "survey"       "survival"    
##  [6] "Matrix"       "grid"         "effects"      "carData"      "modelsummary"
## [11] "fst"          "infer"        "lubridate"    "forcats"      "stringr"     
## [16] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [21] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [26] "utils"        "datasets"     "methods"      "base"        
## 
## [[10]]
##  [1] "kableExtra"   "interactions" "aod"          "MASS"         "survey"      
##  [6] "survival"     "Matrix"       "grid"         "effects"      "carData"     
## [11] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
## [16] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [21] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [26] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[11]]
##  [1] "flextable"    "kableExtra"   "interactions" "aod"          "MASS"        
##  [6] "survey"       "survival"     "Matrix"       "grid"         "effects"     
## [11] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [16] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [21] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [26] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [31] "base"        
## 
## [[12]]
##  [1] "scales"       "flextable"    "kableExtra"   "interactions" "aod"         
##  [6] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
## [11] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [16] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [21] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [26] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [31] "methods"      "base"
# Importing Datasets
rm(list=ls()); gc()
##           used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
## Ncells 2641731 141.1    4117981 220.0         NA  4117981 220.0
## Vcells 4519691  34.5   10146329  77.5      16384  6894052  52.6
germany_data <-read.fst("germany_data.fst")
netherlands_data <- read.fst("netherlands_data.fst")

Task 1

Use data for Germany and the variable coding for model3 and model4 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

# New dataset for Germany called dfg
dfg <- germany_data
dfg <- dfg %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
# Apply the reverse coding
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

# Calculate 'schwartzauth' after the NA recoding
dfg$auth <- scales::rescale(dfg$behave + 
                      dfg$secure + 
                      dfg$safety + 
                      dfg$tradition + 
                      dfg$rules, to=c(0,100), na.rm=TRUE)
dfg <- dfg %>% filter(!is.na(auth))

dfg <- dfg %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr))

# Running the models
model3 <- lm(auth ~ polID + religious, data = dfg, weights = weight)
model4 <- lm(auth ~ polID + religious + polID*religious, data = dfg, weights = weight)
modelsummary(
  list(model3, model4),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_rename = c("polIDModerate" = "Moderate", "polIDRight" = "Right", "religious" = "Religious"),
  title = 'Regression model for religiousness and ideological self-placement for Germany')
Regression model for religiousness and ideological self-placement for Germany
 (1)   (2)
(Intercept) 57.5 (0.7)*** 59.6 (0.9)***
Moderate 5.3 (0.7)*** 2.6 (1.1)*
Right 9.3 (1.1)*** 3.4 (1.9)+
Religious 0.3 (0.1)** −0.3 (0.2)+
Moderate:Religious 0.8 (0.2)**
Right:Religious 1.5 (0.4)***
Num.Obs. 2855 2855
R2 0.035 0.041
R2 Adj. 0.034 0.039
AIC 24680.0 24665.5
BIC 24709.7 24707.2
Log.Lik. −12334.978 −12325.764
RMSE 17.13 17.13
Model 3 Interpretations: Compared to the Left-identifying individuals, individuals that identify as Moderate and Right, exhibit scores that are 5.3 and 9.3 points higher, respectively, on the authoritarian values scale (0-100).The triple asterisks denote statistical significance at a p-value less than 0.001. Meaning there is less than a 0.1% probability that the observed result is due to chance. Furthermore, compared to individuals that do not identify as religious, individuals that identify as religious exhibit scores that are 0.3 points higher on the authoritarian values scale (0-100).The double asterisks denote statistical significance at a p-value less than 0.01. Note that the changes descirbed for each predictor variable, assumes all other predictors remain constant.
Model Fit Metrics: The adjusted R-squared value for Model 3 is 0.034, suggesting that ideological self-placement as left, moderate, or right, and religiousness explain approximately 3.4% of the vairance in the authoritarian values scale for Germany, accounting for the number of predictors. This indicates that the combination of political identity and religiousness explains a very small proportion of the variability in authoritarian attitudes in the dataset. Looking at Model 4, the adjusted R-squared value is 0.039, therefore approximately 3.9% of the variability in authoritarian attitudes is explained by the predictors and the interaction term. This indicates that the inclusion of the interaction term (the interaction between ideological self-placement as left, moderate, or right and religiousness) slightly improves the model’s ability to explain the variability in authoritarian attitudes compared to Model 3. Comparing the Akaike Information Criterion (AIC) values between both models we can get a better understanding of the goodness of fit of each model. Model 3 has an AIC of 24680, and Model 4 has an AIC of 24665.5. The lower AIC value suggests that Model 4 is a better choice among the two models, as it achieves a better balance between model complexity and goodness of fit.

Task 2

Now generate the model4 interaction plot that we did in the tutorial, but again using the German data instead of the French. Interpret.

# Interaction plot for model 4
interaction_plot <- effect("polID*religious", model4, na.rm=TRUE)

plot(interaction_plot,
     main="Interaction Effect: Political ID & Religiousness on Authoritarian Attitudes",
     xlab="Religiousness",
     ylab="Authoritarian attitudes scale")

interaction_plot
## 
##  polID*religious effect
##           religious
## polID             0        2        5        8       10
##   Left     59.61010 58.93398 57.91980 56.90562 56.22950
##   Moderate 62.21299 63.14317 64.53843 65.93370 66.86388
##   Right    63.00456 65.26702 68.66072 72.05442 74.31689
Interpretations: By looking at the visual output of the interaction plot, it can be said that there is an interaction between ‘polID’ (ideological self-placement as left, moderate, or right) and ‘religious’ (religiousness) on the authoritarian attitudes scale (0-100). This can be inferred from the plot as the lines for each predictor variable are not parallel to eachother. We can see that the line for polID=Left has a negative slope, therefore will intersect both polID=Moderate and polID=Right. In addition the slope of polID=Right appears steeper than that of polID=Moderate, therefore they too, will intersect if overlaid.

Task 3

Use data for the Netherlands and the variable coding for model5 and model6 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

# New dataset for Netherlands called dfn
dfn <- netherlands_data

# Cleaning and recoding rlgblg into 'religion'
dfn <- dfn %>%
  mutate(religion = case_when(
    rlgblg == 2 ~ "No",
    rlgblg == 1 ~ "Yes",
    rlgblg %in% c(7, 8, 9) ~ NA_character_,
    TRUE ~ as.character(rlgblg)))
# Check
table(dfn$religion)
## 
##    No   Yes 
## 11289  7008
dfn <- dfn %>%
  mutate(
    religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr))

# Cleaning and recoding to make auth variable
dfn <- dfn %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

dfn$auth <- scales::rescale(dfn$behave + 
                      dfn$secure + 
                      dfn$safety + 
                      dfn$tradition + 
                      dfn$rules, to=c(0,100), na.rm=TRUE)

dfn <- dfn %>% filter(!is.na(auth))

# Cleaning and recoding lrscale into 'ID' - *note: polID includes 'Moderate', ID doesn't*
dfn <- dfn %>%
  mutate(ID = case_when(
    lrscale >= 0 & lrscale <= 4 ~ "Left",
    lrscale >= 6 & lrscale <= 10 ~ "Right",
    lrscale > 10 ~ NA_character_,  # Set values above 10 as NA
    TRUE ~ NA_character_))
# Check
table(dfn$ID)
## 
##  Left Right 
##  5449  7010
# Cleaning and recoding yrbrn and categorizing into generational cohorts 'gen'
dfn <- dfn %>%
  mutate(
    cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),
    gen = case_when(
      yrbrn %in% 1900:1945 ~ "1",
      yrbrn %in% 1946:1964 ~ "2",
      yrbrn %in% 1965:1979 ~ "3",
      yrbrn %in% 1980:1996 ~ "4",
      TRUE ~ as.character(yrbrn)),
    gen = factor(gen,
                 levels = c("1", "2", "3", "4"),
                 labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials")))
# Check
table(dfn$gen)
## 
##     Interwar Baby Boomers        Gen X  Millennials 
##         3771         6147         4548         2730
# Modelling
model5 <- lm(auth ~ religion + ID + gen, data = dfn, weights = weight)
model6 <- lm(auth ~ religion + ID + gen + religion*gen, data = dfn, weights = weight)

modelsummary(
  list(model5, model6),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_omit = "Intercept", 
  coef_rename = c("genBaby Boomers" = "Baby Boomers", "genGen X" = "Gen X", "genMillennials" = "Millennials", "IDRight" = "Right", "religionYes" = "Religious"),
  title = 'Regression model for Authoritarianism: Impact of Religiousness, Political Orientation, and Generational Cohort for Netherlands')
Regression model for Authoritarianism: Impact of Religiousness, Political Orientation, and Generational Cohort for Netherlands
 (1)   (2)
Religious 8.2 (0.9)*** 7.5 (2.1)***
Right 3.4 (0.9)*** 3.4 (0.9)***
Baby Boomers −5.8 (1.3)*** −6.5 (1.7)***
Gen X −7.3 (1.4)*** −7.6 (1.8)***
Millennials −8.6 (1.5)*** −8.6 (1.9)***
Religious:Baby Boomers 1.6 (2.6)
Religious:Gen X 0.5 (2.8)
Religious:Millennials −0.6 (3.2)
Num.Obs. 1147 1147
R2 0.126 0.127
R2 Adj. 0.123 0.121
AIC 9479.3 9484.5
BIC 9514.7 9535.0
Log.Lik. −4732.669 −4732.251
RMSE 14.94 14.94
Model 5 Interpretations: All coefficients have triple aesterisks, indicating a p <0.001, meaning there is less than 0.1% probability that the observed result is due to chance. Compared to the Left-identifying individuals, individuals that identify as Right, exhibit scores that are 3.4 points higher on the authoritarian values scale (0-100).Furthermore, compared to individuals that do not identify as religious, individuals that identify as Religious exhibit scores that are 8.2 points higher on the authoritarian values scale(0-100). In terms of generational cohort, compared to the Interwar Generation, the Baby Boomers, GenX, and Millennials exhibit scores, 5.8, 7.3, and 8.6 points lower on the authoritarian values scale (0-100). Note that the changes descirbed for each predictor variable, assume all other predictors remain constant.
Model Fit Metrics: The adjusted R-squared value for Model 5 is 0.123, suggesting that generational cohort, placement on left-right political, and religiousness explain approximately 12.3% of the vairance in the authoritarian values scale for Netherlands, accounting for the number of predictors. This indicates that the combination of political identity, religiousness, and generational cohort explains a small proportion of the variability in authoritarian attitudes in the dataset. Looking at Model 6, the adjusted R-squared value is 0.121, therefore approximately 12.1% of the variability in authoritarian attitudes is explained by the predictors and the interaction term. This indicates that the inclusion of the interaction term (the interaction between generational cohort and religiousness) slightly worsens the model’s ability to explain the variability in authoritarian attitudes compared to Model 5. Comparing the Akaike Information Criterion (AIC) values between both models we can get a better understanding of the goodness of fit of each model. Model 5 has an AIC of 9479.3, and Model 6 has an AIC of 9484.5. This indicates that Model 5 provides a better fit to the data compared to Model 6. The lower AIC value suggests that Model 5 is a better choice among the two models, as it achieves a better balance between model complexity and goodness of fit. The Bayesian Information Criterion (BIC) also indicates that Model 5 is a better model than Model 6 as it has a lower BIC value of 9514.7 compared to 9535.0.

Task 4

Produce the model7 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

# Cleaning and recoding lrscale to make polID 
dfn <- dfn %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_))
# Check
table(dfn$polID)
## 
##     Left Moderate    Right 
##     3420     8646     4747
# Modelling effect of generational cohort & political ID on authoritarianism"
model7 <- lm(auth ~ + cohort + polID + cohort*polID, data = dfn, weights = weight)
interact_plot(model7, pred = cohort, modx = polID, jnplot = TRUE, main = "Interaction Effect of Generational Cohort & Political ID on Authoritarianism")

Interpretations: By looking at the visual output of the interaction plot, it can be said that there is an interaction between ‘polID’ (ideological self-placement as left, moderate, or right) and generational cohort on the authoritarian attitudes scale (0-100). This can be inferred from the plot as the lines for each predictor variable intersect at some point along the graph, they are not parallel to eachother. In addition, since polID=right intersects polID=Left at a different point than it intersects polID=Moderate, it indicates the effect of generational cohort on authoritarian values varies depending on ideological self-placement as left, moderate, or right. This suggests the presence of an interaction effect.

Task 5

Produce the model8 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

# Modelling effect of religiousness and political ID on authoritarianism
model8 <- lm(auth ~ religious + ID + religious*ID, data = dfn, weights = weight)
interact_plot(model8, pred = religious, modx = ID, jnplot = TRUE, main= "Interaction Plot: Effect of Religiousness and Political ID on Authoritarianism")

Interpretations: Observing the output, there is no visible intersection of the lines for the predictor variables that can be seen on the plot and the slopes of the lines are very similar. However, the lines are not parallel, this can be determined as they are separated by 1 full box (5 units on the authoritatian values scale) on the left of the plot, but as religiousness values increase they are separated by less than that value. Although these lines are not exactly parallel, given that the religious scale is 0-10 and there is no intersection within the observed range of the data set, there is no interaction between ‘ID’ (placement on left-right political scale) and religiousness on the authoritarian attitudes scale (0-100).