Haya_El-ouri_homework

rm(list=ls()); gc()

##          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 526379 28.2    1169231 62.5         NA   669417 35.8
## Vcells 969666  7.4    8388608 64.0      16384  1851787 14.2

# List of packages
packages <- c("tidyverse", "infer", "fst", "modelsummary", "effects", "survey", "MASS", "aod", "interactions", "kableExtra", "flextable", "scales") # add any you need here

# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load the packages
lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: carData
## 
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
## 
## Loading required package: grid
## 
## Loading required package: Matrix
## 
## 
## Attaching package: 'Matrix'
## 
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 
## Loading required package: survival
## 
## 
## Attaching package: 'survey'
## 
## 
## The following object is masked from 'package:graphics':
## 
##     dotchart
## 
## 
## 
## Attaching package: 'MASS'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## 
## 
## Attaching package: 'aod'
## 
## 
## The following object is masked from 'package:survival':
## 
##     rats
## 
## 
## 
## Attaching package: 'kableExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## 
## 
## 
## Attaching package: 'flextable'
## 
## 
## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote
## 
## 
## The following object is masked from 'package:purrr':
## 
##     compose
## 
## 
## 
## Attaching package: 'scales'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "infer"     "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "fst"       "infer"     "lubridate" "forcats"   "stringr"   "dplyr"    
##  [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
## [13] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [19] "base"     
## 
## [[4]]
##  [1] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
##  [6] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [11] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [16] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[5]]
##  [1] "effects"      "carData"      "modelsummary" "fst"          "infer"       
##  [6] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [11] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [16] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [21] "methods"      "base"        
## 
## [[6]]
##  [1] "survey"       "survival"     "Matrix"       "grid"         "effects"     
##  [6] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [11] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [16] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [21] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [26] "base"        
## 
## [[7]]
##  [1] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
##  [6] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [11] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [16] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [21] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [26] "methods"      "base"        
## 
## [[8]]
##  [1] "aod"          "MASS"         "survey"       "survival"     "Matrix"      
##  [6] "grid"         "effects"      "carData"      "modelsummary" "fst"         
## [11] "infer"        "lubridate"    "forcats"      "stringr"      "dplyr"       
## [16] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [21] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [26] "datasets"     "methods"      "base"        
## 
## [[9]]
##  [1] "interactions" "aod"          "MASS"         "survey"       "survival"    
##  [6] "Matrix"       "grid"         "effects"      "carData"      "modelsummary"
## [11] "fst"          "infer"        "lubridate"    "forcats"      "stringr"     
## [16] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [21] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [26] "utils"        "datasets"     "methods"      "base"        
## 
## [[10]]
##  [1] "kableExtra"   "interactions" "aod"          "MASS"         "survey"      
##  [6] "survival"     "Matrix"       "grid"         "effects"      "carData"     
## [11] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
## [16] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [21] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [26] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[11]]
##  [1] "flextable"    "kableExtra"   "interactions" "aod"          "MASS"        
##  [6] "survey"       "survival"     "Matrix"       "grid"         "effects"     
## [11] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [16] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [21] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [26] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [31] "base"        
## 
## [[12]]
##  [1] "scales"       "flextable"    "kableExtra"   "interactions" "aod"         
##  [6] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
## [11] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [16] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [21] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [26] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [31] "methods"      "base"

Task 1

Use data for Germany and the variable coding for model3 and model4 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

germany_data <- read_fst("germany_data.fst")

df <- germany_data

germany_data <- read_fst("germany_data.fst")
df <- germany_data

df <- df %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
  # Apply the reverse coding
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

df$auth <- scales::rescale(df$behave + 
                      df$secure + 
                      df$safety + 
                      df$tradition + 
                      df$rules, to=c(0,100), na.rm=TRUE)

df <- df %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )

df <- df %>% filter(!is.na(auth))

model3 <- lm(auth ~ polID + religious, data = df, weights = weight)
model4 <- lm(auth ~ polID + religious + polID*religious, data = df, weights = weight)

modelsummary(
  list(model3, model4),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL)

	(1)	(2)
(Intercept)	57.5 (0.7)***	59.6 (0.9)***
polIDModerate	5.3 (0.7)***	2.6 (1.1)*
polIDRight	9.3 (1.1)***	3.4 (1.9)+
religious	0.3 (0.1)**	−0.3 (0.2)+
polIDModerate × religious		0.8 (0.2)**
polIDRight × religious		1.5 (0.4)***
Num.Obs.	2855	2855
R2	0.035	0.041
R2 Adj.	0.034	0.039
AIC	24680.0	24665.5
BIC	24709.7	24707.2
Log.Lik.	−12334.978	−12325.764
RMSE	17.13	17.13

According to PollDModerate (5.3), the political moderates are 5.3 units ahead of the left (the reference category). According to PollDRight (9.3), the average difference between those who identify as more left-sided and those who identify as right-sided is 9.3 units.Given that the religious predictor is 0.3 (on the authoritarian values scale), an increase of one unit in religiousness will result in a corresponding 0.3 rise in the predicted average.The p-value is less than 0.001, which indicates that there is less than 0.1% chance that the observed result is the result of chance, according to the *** for each coefficient. Erroneously, the null hypothesis is rejected.The adjusted R-squared fit score is 0.034 (less than 1), indicating that there isn’t an optimal predictor in the model. ## Task 2

Now generate the model4 interaction plot that we did in the tutorial, but again using the German data instead of the French. Interpret.

germany_data <- read_fst("germany_data.fst")

interaction_plot <- effect("polID*religious", model4, na.rm=TRUE)

plot(interaction_plot,
     main="Interaction effect",
     xlab="Religiousness",
     ylab="Authoritarian attitudes scale")

interaction_plot

## 
##  polID*religious effect
##           religious
## polID             0        2        5        8       10
##   Left     59.61010 58.93398 57.91980 56.90562 56.22950
##   Moderate 62.21299 63.14317 64.53843 65.93370 66.86388
##   Right    63.00456 65.26702 68.66072 72.05442 74.31689

The pollD Left’s declining slope indicates that a greater proportion of individuals who identify as politically left also identify as less religious according on the Authoritarian views scale. People who identify as politically moderate also identify as slightly more religious, as evidenced by the PollD moderate scale’s small upward slope. Last but not least, the pollD Right’s sharp upward slope demonstrates that a greater proportion of those who identify as politically right also have stronger religious beliefs, based on the Authoritarian views scale.

Task 3

Use data for the Netherlands and the variable coding for model5 and model6 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

netherlands_data <- read_fst("netherlands_data.fst")

df <- netherlands_data

df <- df %>%
  mutate(religion = case_when(
    rlgblg == 2 ~ "No",
    rlgblg == 1 ~ "Yes",
    rlgblg %in% c(7, 8, 9) ~ NA_character_,
    TRUE ~ as.character(rlgblg)
  ))

table(df$religion)

## 
##    No   Yes 
## 11289  7008

df <- df %>%
  mutate(ID = case_when(
    lrscale >= 0 & lrscale <= 4 ~ "Left",
    lrscale >= 6 & lrscale <= 10 ~ "Right",
    lrscale > 10 ~ NA_character_,  # Set values above 10 as NA
    TRUE ~ NA_character_  # Ensure value 5 and any other unexpected values are set as NA
  ))
table(df$ID)

## 
##  Left Right 
##  5605  7247

df <- df %>%
  mutate(
    
cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),

  gen = case_when(
    yrbrn %in% 1900:1945 ~ "1",
    yrbrn %in% 1946:1964 ~ "2",
    yrbrn %in% 1965:1979 ~ "3",
    yrbrn %in% 1980:1996 ~ "4",
    TRUE ~ as.character(yrbrn)  
    ),

gen = factor(gen,
  levels = c("1", "2", "3", "4"),
  labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials")),
)

table(df$gen)

## 
##     Interwar Baby Boomers        Gen X  Millennials 
##         3967         6359         4684         2803

df <- df %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%

  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

df$auth <- scales::rescale(df$behave + 
                      df$secure + 
                      df$safety + 
                      df$tradition + 
                      df$rules, to=c(0,100), na.rm=TRUE)

df <- df %>% filter(!is.na(auth))

df <- df %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )

model5 <- lm(auth ~ religion + ID + gen, data = df, weights = weight)
model6 <- lm(auth ~ religion + ID + gen + religion*gen, data = df, weights = weight)

modelsummary(
  list(model5, model6),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_omit = "Intercept")

	(1)	(2)
religionYes	8.2 (0.9)***	7.5 (2.1)***
IDRight	3.4 (0.9)***	3.4 (0.9)***
genBaby Boomers	−5.8 (1.3)***	−6.5 (1.7)***
genGen X	−7.3 (1.4)***	−7.6 (1.8)***
genMillennials	−8.6 (1.5)***	−8.6 (1.9)***
religionYes × genBaby Boomers		1.6 (2.6)
religionYes × genGen X		0.5 (2.8)
religionYes × genMillennials		−0.6 (3.2)
Num.Obs.	1147	1147
R2	0.126	0.127
R2 Adj.	0.123	0.121
AIC	9479.3	9484.5
BIC	9514.7	9535.0
Log.Lik.	−4732.669	−4732.251
RMSE	14.94	14.94

religiousYes (8.2): This coefficient shows that the average result for people with religions is 8.2 units higher than the reference group when comparing people with the same political leaning.

IDRight (3.4): This coefficient indicates that, on the authoritarian values scale, those who identify as politically right should, on average, be 3.4 units higher than the reference group when comparing people of similar religiousness.

genBaby Boomers (-5.8): On the authoritarian values scale of 0-100, Baby Boomers score 5.8 points lower on average than the reference category.

genGen X (-7.3): On the authoritarian values scale, Generation X performs significantly worse than the reference category, scoring 7.3 points lower.

genMillennials (-8.6): With 8.6 points fewer than the reference group, millennials show the biggest decline in authoritarian views. Millennials differ most from politically left-leaning individuals in terms of their authoritarian ideals, according to this statistically significant and noticeable difference.

With a p < 0.001, all of the coefficients are significant and have a *** symbol.This indicates that, under frequentist assumptions, there is less than a 0.1% chance that the observed outcome is the result of chance, and we reject the null wrongly.

AIC model5 has a lower value of 9479.3 than model6 when comparing the fit metrics models, indicating that model5 is more suitable for testing this dataset than model6.The situation with BIC is comparable; model 5 has a lower BIC value of 9514.7 than model 6, which has 9535. This indicates that model 5 fits this dataset better for testing. Furthermore, it can be observed that model6 fits the dataset better than model5 because its log.like value (-4732.25) is less negative than that of model5 (-4732.66). Furthermore, I am unable to decide whether model fits the data better just based on RMSE because the two models have the same RMSE value of 14.94.

The modified R-squared values of 0.123 (model 5) and 0.121 (model 6) show that the variation in authoritarian value scales for the Netherlands can be explained by pollD, religious factors, and generation variables to the extent that they account for 12.3% and 12.1% of the variance in model 5 and model 6, respectively. Both models account for some of the variability in authoritarian values, indicating the possibility of external variations.

Task 4

Produce the model7 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

df <- df %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )
model7 <- lm(auth ~ + cohort + polID + cohort*polID, data = df, weights = weight)
interact_plot(model7, pred = cohort, modx = polID, jnplot = TRUE)

polIDLeft: I observe that the authoritarian score differential between the coded generations is the least.

polIDModerate: It appears to me that the more religiously inclined people are, the

Finally, I discovered that the authoritarian score difference between the coded generations for polIDRight has the steepest slope and is the highest.

The general pattern across all coded political stances is that people’s authoritarian sentiments tend to decrease with age.According to a person’s generation, individuals with a political leaning to the right exhibit the highest variance in authoritarian attitudes, while those with a political leaning to the left exhibit the least variance.

There is a generational shift in the authoritarian views of people who identify as Right or Moderate, with younger generations showing less authoritarianism than older generations. On the other hand, people who lean left continue to be authoritarians at the same rates for generations. In sociopolitical research, this type of trend analysis is useful for comprehending how opinions change over time within various political spectrums. ## Task 5

Produce the model8 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

model8 <- lm(auth ~ religious + ID + religious*ID, data = df, weights = weight)
interact_plot(model8, pred = religious, modx = ID, jnplot = TRUE)

Religion and authoritarianism are positively correlated in the Netherlands among both Left- and Right-oriented people, but the association is noticeably larger among the former. This kind of data, which illustrates how various socioeconomic groups may react to religious circumstances under an authoritarian framework, may be helpful for sociological research, political analysis, or policymaking. ## End

Haya_El-ouri_homework_5

2024-03-25

Task 1

Task 3

Task 4