Habib_Haris_Homework

# List of packages
packages <- c("tidyverse", "infer", "fst", "modelsummary", "effects", "survey", "MASS", "aod", "interactions", "kableExtra", "flextable", "scales", "ggeffects") # add any you need here

# Install packages if they aren't installed already
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Load the packages
lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'

## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
## Loading required package: grid
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Loading required package: survival
## 
## Attaching package: 'survey'
## 
## The following object is masked from 'package:graphics':
## 
##     dotchart
## 
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## 
## Attaching package: 'aod'
## 
## The following object is masked from 'package:survival':
## 
##     rats
## 
## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## 
## 
## Attaching package: 'flextable'
## 
## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote
## 
## The following object is masked from 'package:purrr':
## 
##     compose
## 
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
## 
## 
## Attaching package: 'ggeffects'
## 
## The following object is masked from 'package:interactions':
## 
##     johnson_neyman

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "infer"     "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "fst"       "infer"     "lubridate" "forcats"   "stringr"   "dplyr"    
##  [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
## [13] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [19] "base"     
## 
## [[4]]
##  [1] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
##  [6] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [11] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [16] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[5]]
##  [1] "effects"      "carData"      "modelsummary" "fst"          "infer"       
##  [6] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [11] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [16] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [21] "methods"      "base"        
## 
## [[6]]
##  [1] "survey"       "survival"     "Matrix"       "grid"         "effects"     
##  [6] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [11] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [16] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [21] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [26] "base"        
## 
## [[7]]
##  [1] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
##  [6] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [11] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [16] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [21] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [26] "methods"      "base"        
## 
## [[8]]
##  [1] "aod"          "MASS"         "survey"       "survival"     "Matrix"      
##  [6] "grid"         "effects"      "carData"      "modelsummary" "fst"         
## [11] "infer"        "lubridate"    "forcats"      "stringr"      "dplyr"       
## [16] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [21] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [26] "datasets"     "methods"      "base"        
## 
## [[9]]
##  [1] "interactions" "aod"          "MASS"         "survey"       "survival"    
##  [6] "Matrix"       "grid"         "effects"      "carData"      "modelsummary"
## [11] "fst"          "infer"        "lubridate"    "forcats"      "stringr"     
## [16] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [21] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [26] "utils"        "datasets"     "methods"      "base"        
## 
## [[10]]
##  [1] "kableExtra"   "interactions" "aod"          "MASS"         "survey"      
##  [6] "survival"     "Matrix"       "grid"         "effects"      "carData"     
## [11] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
## [16] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [21] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [26] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[11]]
##  [1] "flextable"    "kableExtra"   "interactions" "aod"          "MASS"        
##  [6] "survey"       "survival"     "Matrix"       "grid"         "effects"     
## [11] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [16] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [21] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [26] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [31] "base"        
## 
## [[12]]
##  [1] "scales"       "flextable"    "kableExtra"   "interactions" "aod"         
##  [6] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
## [11] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [16] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [21] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [26] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [31] "methods"      "base"        
## 
## [[13]]
##  [1] "ggeffects"    "scales"       "flextable"    "kableExtra"   "interactions"
##  [6] "aod"          "MASS"         "survey"       "survival"     "Matrix"      
## [11] "grid"         "effects"      "carData"      "modelsummary" "fst"         
## [16] "infer"        "lubridate"    "forcats"      "stringr"      "dplyr"       
## [21] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [26] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [31] "datasets"     "methods"      "base"

Task 1

Use data for Germany and the variable coding for model3 and model4 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

germany_data <- read_fst("germany_data.fst")

df <- germany_data

df <- df %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
  # Apply the reverse coding
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

# Now you can calculate 'schwartzauth' after the NA recoding
df$auth <- scales::rescale(df$behave + 
                      df$secure + 
                      df$safety + 
                      df$tradition + 
                      df$rules, to=c(0,100), na.rm=TRUE)


df <- df %>% filter(!is.na(auth))

df <- df %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )

model3 <- lm(auth ~ polID + religious, data = df, weights = weight)
model4 <- lm(auth ~ polID + religious + polID*religious, data = df, weights = weight)

modelsummary(
  list(model3, model4),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL)

	(1)	(2)
(Intercept)	57.5 (0.7)***	59.6 (0.9)***
polIDModerate	5.3 (0.7)***	2.6 (1.1)*
polIDRight	9.3 (1.1)***	3.4 (1.9)+
religious	0.3 (0.1)**	−0.3 (0.2)+
polIDModerate × religious		0.8 (0.2)**
polIDRight × religious		1.5 (0.4)***
Num.Obs.	2855	2855
R2	0.035	0.041
R2 Adj.	0.034	0.039
AIC	24680.0	24665.5
BIC	24709.7	24707.2
Log.Lik.	−12334.978	−12325.764
RMSE	17.13	17.13

INTERPRETATION:

polIDModerate (5.3): This coefficient indicates that when we compare individuals with the same religiousness, the average outcome for individuals idenitfying politically moderate is 5.3 units higher compared to the reference who identify on the left

polIDRight (9.3): This coefficient suggests that when comparing individuals with the same religiousness, individuals identifying as politically right are on average 9.3 units higher than the reference, which are people identifying ideologically on the left on the authoritarian scale

religious (0.3): for this predictor, a coefficient of 0.3 indicates that every one unit increase in religiousness, there is an expected increase of 0.3 units on authoritarian scales, assuming we are comparing individuals with the same ideological categorization, effectively holding ideological self-placement constant

PollDModerate and PollDRight have *** while religious has ** indicating a p < 0.001. This means that there is less than a 0.1% probability that the observed result is due to chance and we incorrectly reject the null, under frequentist assumptions.

As for the interpretation of the fit metrics models, model4 has a lower AIC value (24665.5) compared to model 3 (24680.0), which means that model4 is better for testing this dataset. Furthermore, BIC model4 has a lower value of 24707.2 than model3 which has 24709.7, meaning that model4 is better fit for testing. Additionally, the log.lik value of model4 (-12325.76) is less negative than of model3 (-12334.97), which means that model4 is better for testing. As for the RMSE, both models have the same value of 17.13, which makes it difficult to determine which one is better fit.

The adjusted R-squared value of model3 is 0.034 while model4 has 0.039, which mean that pollD and religious values explain about 3.4% for model3 and 3.9% for model 4. Both are statistically significant, but this suggests that they only explain a small portion of variability.

Task 2

Now generate the model4 interaction plot that we did in the tutorial, but again using the German data instead of the French. Interpret.

result <- predict_response(model4, c("religious", "polID"))
plot(result)

#Professor parker told me to do it like this by the way. That is why it is different from tutorial

INTERPRETATION: polID = Left, I found that the higher the religiousness of those who lean politically left, the weaker their authoritarian attitudes.

polID = Moderate, I found that the higher the religiousness of those who are politically moderate, there is a slow increase in authoritarian attitudes

Lastly, for polID = Right, I note that the higher the religiousness of those who lean politically right, the stronger their authoritarian attitudes. However, as those who lean politically right shows a much more steeper graph than those who are politically moderate, I can conclude with my findings that those who lean politically right show the greatest variance in authoritarian attitudes scale in accordance with the degree of religiousness.

Task 3

Use data for the Netherlands and the variable coding for model5 and model6 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

netherlands_data <- read_fst("netherlands_data.fst")

df <- netherlands_data

df <- df %>%
  mutate(religion = case_when(
    rlgblg == 2 ~ "No",
    rlgblg == 1 ~ "Yes",
    rlgblg %in% c(7, 8, 9) ~ NA_character_,
    TRUE ~ as.character(rlgblg)
  ))

table(df$religion)

## 
##    No   Yes 
## 11289  7008

df <- df %>%
  mutate(ID = case_when(
    lrscale >= 0 & lrscale <= 4 ~ "Left",
    lrscale >= 6 & lrscale <= 10 ~ "Right",
    lrscale > 10 ~ NA_character_,  # Set values above 10 as NA
    TRUE ~ NA_character_  # Ensure value 5 and any other unexpected values are set as NA
  ))
table(df$ID)

## 
##  Left Right 
##  5605  7247

df <- df %>%
  mutate(
    
cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),

  gen = case_when(
    yrbrn %in% 1900:1945 ~ "1",
    yrbrn %in% 1946:1964 ~ "2",
    yrbrn %in% 1965:1979 ~ "3",
    yrbrn %in% 1980:1996 ~ "4",
    TRUE ~ as.character(yrbrn)  
    ),

gen = factor(gen,
  levels = c("1", "2", "3", "4"),
  labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials")),
)

table(df$gen)

## 
##     Interwar Baby Boomers        Gen X  Millennials 
##         3967         6359         4684         2803

df <- df %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%

  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

df$auth <- scales::rescale(df$behave + 
                      df$secure + 
                      df$safety + 
                      df$tradition + 
                      df$rules, to=c(0,100), na.rm=TRUE)

df <- df %>% filter(!is.na(auth))

df <- df %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )

model5 <- lm(auth ~ religion + ID + gen, data = df, weights = weight)
model6 <- lm(auth ~ religion + ID + gen + religion*gen, data = df, weights = weight)

modelsummary(
  list(model5, model6),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_omit = "Intercept")

	(1)	(2)
religionYes	8.2 (0.9)***	7.5 (2.1)***
IDRight	3.4 (0.9)***	3.4 (0.9)***
genBaby Boomers	−5.8 (1.3)***	−6.5 (1.7)***
genGen X	−7.3 (1.4)***	−7.6 (1.8)***
genMillennials	−8.6 (1.5)***	−8.6 (1.9)***
religionYes × genBaby Boomers		1.6 (2.6)
religionYes × genGen X		0.5 (2.8)
religionYes × genMillennials		−0.6 (3.2)
Num.Obs.	1147	1147
R2	0.126	0.127
R2 Adj.	0.123	0.121
AIC	9479.3	9484.5
BIC	9514.7	9535.0
Log.Lik.	−4732.669	−4732.251
RMSE	14.94	14.94

INTERPRETATION:

religionYes (8.2): This coefficient indicates that, when we compare individuals with the same political leaning, the average outcome for individuals with religions are 8.2 units higher compared to the reference category.

IDRight (3.4): This coefficient tells us that, when comparing individuals with the same religiousness, individuals identifying as politically right are expected to be, on average, 3.4 units higher than the reference category on the authoritarian values scale.

genBaby Boomers (-5.8): Compared to the reference category, Baby Boomers show scores that are, on average, 5.8 points lower on the authoritarian values scale of 0-100

genGen X (-7.3): Generation X shows an even bigger decrease, with scores 7.3 points lower than the reference category on the authoritarian values scale.

genMillennials (-8.6): Millennials exhibit the largest decrease in authoritarian values, with their scores being 8.6 points lower than the reference category. This marked difference is statistically significant, indicating that Millennials diverge most strongly from those who lean politically left in comparing their authoritarian values.

All the coefficients are significant with ***, representing a p < 0.001.This means that there is less than a 0.1% probability that the observed result is due to chance and we incorrectly reject the null, under the frequentist assumptions.

When comparing the fit metrics models, AIC model5 has a lower value of 9479.3 compared to model6, which means that model5 is better for testing this dataset compared to model6. A similar thing happens for BIC, as model5 shows a smaller BIC value of 9514.7 compared to model6, which has 9535. This means that model5 is better fit for testing this dataset. Additionally, the log.like value for model6 (−4732.25) is less negative than that of model5 (−4732.66), model6 is better fit for this dataset. Furthermore, Because both models have the same value of RMSE (14.94), I cannot determine which model is the better fit by considering the value of RMSE only.

As for the adjusted R-squared value of 0.123 (model5) and 0.121 (model6), this indicates that pollD and religious variables, and generation varaibles explain about 12.3% for model5 and 12.1% for model6 for the variance in authoritarian value scales for netherlands. Both mdoels explain a small portion of variabiluty in authoritarian values, showing a chance for outside differences.

Task 4

Produce the model7 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

model7 <- lm(auth ~ + cohort + polID + cohort*polID, data = df, weights = weight)
interact_plot(model7, pred = cohort, modx = polID, jnplot = TRUE)

INTERPRETATION:

polIDLeft: I note that the difference between the coded generations in authoritarian score is the smallest.

polIDModerate: I find that the higher the religiousness of those who are politically moderate,

Lastly, for polIDRight, I found that the difference between the coded generations in authoritarian score is the largest, with a steep slope.

Across all coded political standings, the trend is that the younger individuals are, the lower the authoritarian attitudes scale they possess.Those who lean politically right show the greatest variance in authoritarian attitudes scale and those who lean politically left show the least variance, both in accordance with a generation that an individual belongs to.

Task 5

Produce the model8 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

model8 <- lm(auth ~ religious + ID + religious*ID, data = df, weights = weight)
interact_plot(model8, pred = religious, modx = ID, jnplot = TRUE)

INTERPRETATION: both left and right individuals have a positive correlation between religiousity and authoratarianism, but the slope shows a steeper graph for those on the left. The general trend is that the higher the religiousness of individuals, the higher the authoritarian attitudes scales value.

Habib_Haris_Homework_5

2024-03-23

Task 1

Task 2

Task 3

Task 4

Task 5