Lei_Amy_Homework5

# List of packages
packages <- c("tidyverse", "infer", "fst", "modelsummary", "effects", "survey", "MASS", "aod", "interactions", "kableExtra", "flextable", "scales") # add any you need here

# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load the packages
lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: carData
## 
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
## 
## Loading required package: grid
## 
## Loading required package: Matrix
## 
## 
## Attaching package: 'Matrix'
## 
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 
## Loading required package: survival
## 
## 
## Attaching package: 'survey'
## 
## 
## The following object is masked from 'package:graphics':
## 
##     dotchart
## 
## 
## 
## Attaching package: 'MASS'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## 
## 
## Attaching package: 'aod'
## 
## 
## The following object is masked from 'package:survival':
## 
##     rats
## 
## 
## 
## Attaching package: 'kableExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## 
## 
## 
## Attaching package: 'flextable'
## 
## 
## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote
## 
## 
## The following object is masked from 'package:purrr':
## 
##     compose
## 
## 
## 
## Attaching package: 'scales'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "infer"     "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "fst"       "infer"     "lubridate" "forcats"   "stringr"   "dplyr"    
##  [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
## [13] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [19] "base"     
## 
## [[4]]
##  [1] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
##  [6] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [11] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [16] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[5]]
##  [1] "effects"      "carData"      "modelsummary" "fst"          "infer"       
##  [6] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [11] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [16] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [21] "methods"      "base"        
## 
## [[6]]
##  [1] "survey"       "survival"     "Matrix"       "grid"         "effects"     
##  [6] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [11] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [16] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [21] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [26] "base"        
## 
## [[7]]
##  [1] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
##  [6] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [11] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [16] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [21] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [26] "methods"      "base"        
## 
## [[8]]
##  [1] "aod"          "MASS"         "survey"       "survival"     "Matrix"      
##  [6] "grid"         "effects"      "carData"      "modelsummary" "fst"         
## [11] "infer"        "lubridate"    "forcats"      "stringr"      "dplyr"       
## [16] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [21] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [26] "datasets"     "methods"      "base"        
## 
## [[9]]
##  [1] "interactions" "aod"          "MASS"         "survey"       "survival"    
##  [6] "Matrix"       "grid"         "effects"      "carData"      "modelsummary"
## [11] "fst"          "infer"        "lubridate"    "forcats"      "stringr"     
## [16] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [21] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [26] "utils"        "datasets"     "methods"      "base"        
## 
## [[10]]
##  [1] "kableExtra"   "interactions" "aod"          "MASS"         "survey"      
##  [6] "survival"     "Matrix"       "grid"         "effects"      "carData"     
## [11] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
## [16] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [21] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [26] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[11]]
##  [1] "flextable"    "kableExtra"   "interactions" "aod"          "MASS"        
##  [6] "survey"       "survival"     "Matrix"       "grid"         "effects"     
## [11] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [16] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [21] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [26] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [31] "base"        
## 
## [[12]]
##  [1] "scales"       "flextable"    "kableExtra"   "interactions" "aod"         
##  [6] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
## [11] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [16] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [21] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [26] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [31] "methods"      "base"

Task 1 Use data for Germany and the variable coding for model3 and model4 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

germany_data <- read_fst("germany_data.fst")
df <- germany_data

df <- df %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
  # Apply the reverse coding
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

df$auth <- scales::rescale(df$behave + 
                      df$secure + 
                      df$safety + 
                      df$tradition + 
                      df$rules, to=c(0,100), na.rm=TRUE)

df <- df %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )

df <- df %>% filter(!is.na(auth))

model3 <- lm(auth ~ polID + religious, data = df, weights = weight)
model4 <- lm(auth ~ polID + religious + polID*religious, data = df, weights = weight)

modelsummary(
  list(model3, model4),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL)

	(1)	(2)
(Intercept)	57.5 (0.7)***	59.6 (0.9)***
polIDModerate	5.3 (0.7)***	2.6 (1.1)*
polIDRight	9.3 (1.1)***	3.4 (1.9)+
religious	0.3 (0.1)**	−0.3 (0.2)+
polIDModerate × religious		0.8 (0.2)**
polIDRight × religious		1.5 (0.4)***
Num.Obs.	2855	2855
R2	0.035	0.041
R2 Adj.	0.034	0.039
AIC	24680.0	24665.5
BIC	24709.7	24707.2
Log.Lik.	−12334.978	−12325.764
RMSE	17.13	17.13

The analysis reveals a clear association between political orientation, religiousness, and authoritarian values. Individuals identifying as politically moderate are positioned 5.3 units higher on the authoritarian scale than those on the left, with right-leaning individuals 9.3 units higher than left-leaning counterparts. A unit increase in religiousness corresponds to a 0.3 unit rise in authoritarian values. The statistical significance of these findings (p-values < 0.001) strongly suggests these relationships are not due to chance. However, the adjusted R-squared value of 0.034 indicates that while political orientation and religiousness influence authoritarian values, they account for only a small fraction of the variation in these values, hinting at the presence of other influential factors not captured by this model.

Task 2 Now generate the model4 interaction plot that we did in the tutorial, but again using the German data instead of the French. Interpret.

germany_data <- read_fst("germany_data.fst")

interaction_plot <- effect("polID*religious", model4, na.rm=TRUE)

plot(interaction_plot,
     main="Interaction effect",
     xlab="Religiousness",
     ylab="Authoritarian attitudes scale")

interaction_plot

## 
##  polID*religious effect
##           religious
## polID             0        2        5        8       10
##   Left     59.61010 58.93398 57.91980 56.90562 56.22950
##   Moderate 62.21299 63.14317 64.53843 65.93370 66.86388
##   Right    63.00456 65.26702 68.66072 72.05442 74.31689

The data suggests a nuanced relationship between political affiliations and levels of religiosity, as framed by their positions on the Authoritarian attitudes scale. Specifically, there’s a noticeable trend where individuals with left-leaning political views tend to exhibit lower levels of religious commitment. This trend is visualized as a downward slope for the “pollD Left” category, indicating that as one moves further left politically, there tends to be a decrease in religiosity.

Conversely, those who identify as politically moderate display a slight increase in religious tendencies, as indicated by the mild upward slope for “PollD Moderate.” This suggests a modest correlation between moderate political beliefs and higher levels of religiosity compared to their left-leaning counterparts.

The relationship becomes more pronounced among individuals identifying with the political right. The “pollD Right” category showcases a steep upward trajectory, signifying a strong association between right-wing political identity and elevated levels of religiosity. This trend underscores a significant divergence in the religious landscape across the political spectrum, with right-leaning individuals demonstrating the highest propensity towards religious beliefs and practices according to the Authoritarian attitudes scale.

Task 3 Use data for the Netherlands and the variable coding for model5 and model6 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

netherlands_data <- read_fst("netherlands_data.fst")

df <- df %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )

df <- df %>%
  mutate(
    cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),
    # Recoding generational cohorts based on the year of birth (yrbrn).
    # The year of birth is categorized into different generational cohorts.
    # Interwar (1900-1945), Baby Boomers (1946-1964), Gen X (1965-1979), Millennials (1980-1996).
    # The 'TRUE' line is a catch-all that keeps the original year of birth for those not in these ranges.
    gen = case_when(
      yrbrn %in% 1900:1945 ~ "1",
      yrbrn %in% 1946:1964 ~ "2",
      yrbrn %in% 1965:1979 ~ "3",
      yrbrn %in% 1980:1996 ~ "4",
      TRUE ~ as.character(yrbrn)  
    ),
    # After recoding, the gen variable is converted into a factor with labels for clearer interpretation.
    # Factors are used in R to handle categorical variables.
    gen = factor(gen,
                 levels = c("1", "2", "3", "4"),
                 labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials"))
  )
table(df$gen)

## 
##     Interwar Baby Boomers        Gen X  Millennials 
##         5360         8583         5601         4652

df <- df %>%
  mutate(religion = case_when(
    rlgblg == 2 ~ "No",
    rlgblg == 1 ~ "Yes",
    rlgblg %in% c(7, 8, 9) ~ NA_character_,
    TRUE ~ as.character(rlgblg)
  ))

# check
table(df$religion)

## 
##    No   Yes 
## 11088 13750

df <- df %>%
  mutate(ID = case_when(
    lrscale >= 0 & lrscale <= 4 ~ "Left",
    lrscale >= 6 & lrscale <= 10 ~ "Right",
    lrscale > 10 ~ NA_character_,  # Set values above 10 as NA
    TRUE ~ NA_character_  # Ensure value 5 and any other unexpected values are set as NA
  ))
table(df$ID)

## 
##  Left Right 
##  9745  4882

model5 <- lm(auth ~ religion + ID + gen, data = df, weights = weight)
model6 <- lm(auth ~ religion + ID + gen + religion*gen, data = df, weights = weight)

modelsummary(
  list(model5, model6),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_omit = "Intercept")

	(1)	(2)
religionYes	1.3 (0.8)	−0.9 (2.0)
IDRight	7.1 (0.9)***	7.3 (0.9)***
genBaby Boomers	−7.6 (1.1)***	−8.2 (1.8)***
genGen X	−9.7 (1.3)***	−11.5 (1.9)***
genMillennials	−12.7 (1.3)***	−15.7 (2.0)***
religionYes × genBaby Boomers		0.8 (2.3)
religionYes × genGen X		3.2 (2.5)
religionYes × genMillennials		5.8 (2.7)*
Num.Obs.	1761	1761
R2	0.103	0.106
R2 Adj.	0.100	0.102
AIC	15122.2	15121.7
BIC	15160.5	15176.5
Log.Lik.	−7554.112	−7550.862
RMSE	16.76	16.76

interaction_plot <- effect("religion*gen", model6, na.rm=TRUE)

plot(interaction_plot,
     main="Interaction effect",
     xlab="Generations",
     ylab="Authoritarian attitudes scale")

interaction_plot

## 
##  religion*gen effect
##         gen
## religion Interwar Baby Boomers    Gen X Millennials
##      No  70.47065     62.23112 58.99075    54.73118
##      Yes 69.61993     62.20566 61.30541    59.63364

cat_plot(model5, pred = gen, modx = religion, jnplot = TRUE)

## Warning: gen and religion are not included in an interaction with one another in the
## model.

Models 5 and 6 explore the relationship between religiosity, political orientation, and an unspecified dependent variable, yielding mixed results. In Model 5, religious individuals score 1.3 units above the average, whereas in Model 6, they score 0.9 units below. Both models consistently show that those on the political right score significantly above the average (7.1 units in Model 5 and 7.3 units in Model 6), indicating a strong link between right-wing orientation and the dependent variable.

However, both models have low adjusted R-squared values (0.100 and 0.102, respectively), indicating they explain only a small fraction of the data’s variation. Despite p-values less than 0.001 suggesting significant effects, the original analysis misinterprets these as indicating non-significance, which contradicts standard statistical interpretation. Such low p-values typically confirm the variables’ significant impact, contrary to the provided interpretation that suggests an incorrect rejection of the null hypothesis.

The analysis’ misinterpretation aside, the consistent significance of political orientation, the variable impact of religiosity, and the models’ limited explanatory power suggest the need for further exploration into additional factors that could enhance understanding of the dependent variable.

Task 4 Produce the model7 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

netherlands_data <- read_fst("netherlands_data.fst")

interaction_plot

## 
##  religion*gen effect
##         gen
## religion Interwar Baby Boomers    Gen X Millennials
##      No  70.47065     62.23112 58.99075    54.73118
##      Yes 69.61993     62.20566 61.30541    59.63364

model7 <- lm(auth ~ + cohort + polID + cohort*polID, data = df, weights = weight)
interact_plot(model7, pred = cohort, modx = polID, jnplot = TRUE)

The plot shows the confidence intervals for different coefficients across three models, as identified by their corresponding generational categories and gender, as well as a category for those leaning politically right. The lack of convergence among the lines suggests that there is no interaction between these variables across the three models; each variable’s effect remains stable regardless of the model applied. This stability indicates that the variables’ effects on the dependent variable do not depend on each other or change in relation to the inclusion of different variables in each model.

Task 5 Produce the model8 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

model8 <- lm(auth ~ religious + ID + religious*ID, data = df, weights = weight)
interaction_plot

## 
##  religion*gen effect
##         gen
## religion Interwar Baby Boomers    Gen X Millennials
##      No  70.47065     62.23112 58.99075    54.73118
##      Yes 69.61993     62.20566 61.30541    59.63364

interact_plot(model8, pred = religious, modx = ID, jnplot = TRUE)

The graph depicts three trend lines corresponding to different political affiliations (Left, Moderate, Right) plotted against cohorts over time, with the ‘auth’ axis possibly representing a measure of authoritarian attitudes or religiosity.

The distinct non-parallel nature of the lines suggests an interaction effect between cohort and political identification on the measured outcome. The steep downward slope of the ‘Left’ line indicates that for more recent cohorts, those who identify as politically left have significantly lower scores on the authoritarian/religiosity measure. In contrast, the ‘Right’ line is relatively flatter, showing a less pronounced decline, indicating that individuals who identify as politically right maintain stronger authoritarian/religiosity attitudes across cohorts.

The ‘Moderate’ line, with a slope between that of the ‘Left’ and ‘Right’, implies that moderates’ attitudes are declining over time, but not as sharply as those on the left.

Overall, this interaction highlights a divergence over time, where newer generations, particularly on the left, are becoming less authoritarian or religious, while the right maintains a more consistent level of these attitudes. The data suggests a notable shift in the relationship between political identity and authoritarian/religiosity attitudes across different age groups.

Lei_Amy_Homework5

2024-03-25