Kim_Eric_Homework

Initial set-up

packages <- c("tidyverse", "infer", "fst", "modelsummary", "effects", "survey", "MASS", "aod", "interactions", "kableExtra", "flextable", "scales")

new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

lapply(packages, library, character.only = TRUE)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

## Warning: 패키지 'infer'는 R 버전 4.3.3에서 작성되었습니다

## Warning: 패키지 'effects'는 R 버전 4.3.3에서 작성되었습니다

## 필요한 패키지를 로딩중입니다: carData

## Warning: 패키지 'carData'는 R 버전 4.3.3에서 작성되었습니다

## lattice theme set by effectsTheme()
## See ?effectsTheme for details.

## Warning: 패키지 'survey'는 R 버전 4.3.3에서 작성되었습니다

## 필요한 패키지를 로딩중입니다: grid
## 필요한 패키지를 로딩중입니다: Matrix
## 
## 다음의 패키지를 부착합니다: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 필요한 패키지를 로딩중입니다: survival
## 
## 다음의 패키지를 부착합니다: 'survey'
## 
## The following object is masked from 'package:graphics':
## 
##     dotchart
## 
## 
## 다음의 패키지를 부착합니다: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

## Warning: 패키지 'aod'는 R 버전 4.3.3에서 작성되었습니다

## 
## 다음의 패키지를 부착합니다: 'aod'
## 
## The following object is masked from 'package:survival':
## 
##     rats

## Warning: 패키지 'interactions'는 R 버전 4.3.3에서 작성되었습니다

## 
## 다음의 패키지를 부착합니다: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

## Warning: 패키지 'flextable'는 R 버전 4.3.3에서 작성되었습니다

## 
## 다음의 패키지를 부착합니다: 'flextable'
## 
## The following objects are masked from 'package:kableExtra':
## 
##     as_image, footnote
## 
## The following object is masked from 'package:purrr':
## 
##     compose
## 
## 
## 다음의 패키지를 부착합니다: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

## [[1]]
##  [1] "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"     "readr"    
##  [7] "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"     "graphics" 
## [13] "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[2]]
##  [1] "infer"     "lubridate" "forcats"   "stringr"   "dplyr"     "purrr"    
##  [7] "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse" "stats"    
## [13] "graphics"  "grDevices" "utils"     "datasets"  "methods"   "base"     
## 
## [[3]]
##  [1] "fst"       "infer"     "lubridate" "forcats"   "stringr"   "dplyr"    
##  [7] "purrr"     "readr"     "tidyr"     "tibble"    "ggplot2"   "tidyverse"
## [13] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"  
## [19] "base"     
## 
## [[4]]
##  [1] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
##  [6] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [11] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [16] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[5]]
##  [1] "effects"      "carData"      "modelsummary" "fst"          "infer"       
##  [6] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [11] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [16] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [21] "methods"      "base"        
## 
## [[6]]
##  [1] "survey"       "survival"     "Matrix"       "grid"         "effects"     
##  [6] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [11] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [16] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [21] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [26] "base"        
## 
## [[7]]
##  [1] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
##  [6] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [11] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [16] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [21] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [26] "methods"      "base"        
## 
## [[8]]
##  [1] "aod"          "MASS"         "survey"       "survival"     "Matrix"      
##  [6] "grid"         "effects"      "carData"      "modelsummary" "fst"         
## [11] "infer"        "lubridate"    "forcats"      "stringr"      "dplyr"       
## [16] "purrr"        "readr"        "tidyr"        "tibble"       "ggplot2"     
## [21] "tidyverse"    "stats"        "graphics"     "grDevices"    "utils"       
## [26] "datasets"     "methods"      "base"        
## 
## [[9]]
##  [1] "interactions" "aod"          "MASS"         "survey"       "survival"    
##  [6] "Matrix"       "grid"         "effects"      "carData"      "modelsummary"
## [11] "fst"          "infer"        "lubridate"    "forcats"      "stringr"     
## [16] "dplyr"        "purrr"        "readr"        "tidyr"        "tibble"      
## [21] "ggplot2"      "tidyverse"    "stats"        "graphics"     "grDevices"   
## [26] "utils"        "datasets"     "methods"      "base"        
## 
## [[10]]
##  [1] "kableExtra"   "interactions" "aod"          "MASS"         "survey"      
##  [6] "survival"     "Matrix"       "grid"         "effects"      "carData"     
## [11] "modelsummary" "fst"          "infer"        "lubridate"    "forcats"     
## [16] "stringr"      "dplyr"        "purrr"        "readr"        "tidyr"       
## [21] "tibble"       "ggplot2"      "tidyverse"    "stats"        "graphics"    
## [26] "grDevices"    "utils"        "datasets"     "methods"      "base"        
## 
## [[11]]
##  [1] "flextable"    "kableExtra"   "interactions" "aod"          "MASS"        
##  [6] "survey"       "survival"     "Matrix"       "grid"         "effects"     
## [11] "carData"      "modelsummary" "fst"          "infer"        "lubridate"   
## [16] "forcats"      "stringr"      "dplyr"        "purrr"        "readr"       
## [21] "tidyr"        "tibble"       "ggplot2"      "tidyverse"    "stats"       
## [26] "graphics"     "grDevices"    "utils"        "datasets"     "methods"     
## [31] "base"        
## 
## [[12]]
##  [1] "scales"       "flextable"    "kableExtra"   "interactions" "aod"         
##  [6] "MASS"         "survey"       "survival"     "Matrix"       "grid"        
## [11] "effects"      "carData"      "modelsummary" "fst"          "infer"       
## [16] "lubridate"    "forcats"      "stringr"      "dplyr"        "purrr"       
## [21] "readr"        "tidyr"        "tibble"       "ggplot2"      "tidyverse"   
## [26] "stats"        "graphics"     "grDevices"    "utils"        "datasets"    
## [31] "methods"      "base"

Task 1

Use data for Germany and the variable coding for model3 and model4 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

Loading data for Germany and recoding the variables

germany_data <- read_fst("germany_data.fst")

task1 <- germany_data %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

task1$auth <- scales::rescale(task1$behave + 
                      task1$secure + 
                      task1$safety + 
                      task1$tradition + 
                      task1$rules, to=c(0,100), na.rm=TRUE)

task1 <- task1 %>% filter(!is.na(auth))

task1 <- task1 %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )

Doing a modelsummary table, displaying both model outputs

model3 <- lm(auth ~ polID + religious, data = task1, weights = weight)
model4 <- lm(auth ~ polID + religious + polID*religious, data = task1, weights = weight)

modelsummary(
  list(model3, model4),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL)

	(1)	(2)
(Intercept)	57.5 (0.7)***	59.6 (0.9)***
polIDModerate	5.3 (0.7)***	2.6 (1.1)*
polIDRight	9.3 (1.1)***	3.4 (1.9)+
religious	0.3 (0.1)**	-0.3 (0.2)+
polIDModerate × religious		0.8 (0.2)**
polIDRight × religious		1.5 (0.4)***
Num.Obs.	2855	2855
R2	0.035	0.041
R2 Adj.	0.034	0.039
AIC	24680.0	24665.5
BIC	24709.7	24707.2
Log.Lik.	-12334.978	-12325.764
RMSE	17.13	17.13

Interpreting the coefficients for the MLR without the interaction (the first model only)

polIDModerate (5.3): This coefficient suggests that, comparing individuals with the same religiousness (i.e., what some term “holding constant”), the average outcome for individuals identifying as politically moderate are 5.3 units higher compared to the reference category (in our case, identifying ideologically as on the ‘left’).

polIDRight (9.3): This coefficient indicates that, comparing individuals with the same religiousness, individuals identifying as politically right are expected to be, on average, 9.3 units higher than the reference category (i.e., identifying ideologically as on the ‘left’) on the authoritarian values scale.

religious (0.3): For the ‘religious’ predictor, a coefficient of 0.3 means that with every one-unit increase in religiousness, there is an associated expected average increase of 0.3 units on the authoritarian values scale, assuming that we are comparing people with the same ideological categorization (i.e., holding ideological self-placement constant).

All the coefficients are significant either with *** for polIDModerate and polIDRight or with ** for religious, indicating a p < 0.001. Meaning: There is less than a 0.1% probability that the observed result is due to chance and we incorrectly reject the null, under the frequentist assumptions.

Interpreting model fit metrics for both models

AIC: Because model4 has lower AIC value of 24665.5 compared to that of model3 (24680.0), model4 is the better fit for testing this dataset, compared to model3.

BIC: Again, because model4 has lower BIC value of 24707.2 compared to that of model3 (24707.2), model4 is the better fit for testing this dataset, compared to model3.

Log.Lik.: Because the log.lik value of model4 (12325.764) is less negative than that of model3 (12334.978), model4 is the better fit for testing this dataset, compared to model3.

RMSE: Because both models have the same value of RMSE (17.13), I cannot determine which model is the better fit by considering the value of RMSE only.

Adjusted R-Squared: The adjusted R-squared values of 0.034 (for model3) and 0.039 (for model4) mean that the polID (moderate, left, and right) and religious variables (i.e., the four coded ideologies) explain approximately 3.4% (for model3) and 3.9% (for model4) of the variance in the authoritarian values scale for Germany, after accounting for the number of predictors. While both models being statistically significant, this suggests that both models explain only a small portion of the variability in authoritarian values, pointing to other factors outside ideological differences that may play critical roles in shaping these attitudes. Also, because model4 covers slightly more than model3, taking other model fit metrics into consideration, model4 is the overall better model that fits for testing this dataset.

Task 2

Now generate the model4 interaction plot that we did in the tutorial, but again using the German data instead of the French. Interpret.

Generating the model4 interaction plot, using the German data

interaction_plot <- effect("polID*religious", model4, na.rm=TRUE)

plot(interaction_plot,
     main="Interaction effect",
     xlab="Religiousness",
     ylab="Authoritarian attitudes scale")

interaction_plot

## 
##  polID*religious effect
##           religious
## polID             0        2        5        8       10
##   Left     59.61010 58.93398 57.91980 56.90562 56.22950
##   Moderate 62.21299 63.14317 64.53843 65.93370 66.86388
##   Right    63.00456 65.26702 68.66072 72.05442 74.31689

Interpreting the model4 interaction plot

For polID = Left, I note that the higher the religiousness of those who lean politically left, the weaker their authoritarian attitudes.

For polID = Moderate, I note that the higher the religiousness of those who are politically moderate,

Lastly, for polID = Right, I note that the higher the religiousness of those who lean politically right, the stronger their authoritarian attitudes. However, as those who lean politically right shows a much more steeper graph than those who are politically moderate, I can conclude that those who lean politically right show the greatest variance in authoritarian attitudes scale in accordance with the degree of religiousness (0-10).

Task 3

Use data for the Netherlands and the variable coding for model5 and model6 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.

Loading data for the Netherlands and recoding the variables

netherlands_data <- read_fst("netherlands_data.fst")

task3 <- netherlands_data  %>%
  mutate(religion = case_when(
    rlgblg == 2 ~ "No",
    rlgblg == 1 ~ "Yes",
    rlgblg %in% c(7, 8, 9) ~ NA_character_,
    TRUE ~ as.character(rlgblg)
  ))

table(task3$religion)

## 
##    No   Yes 
## 11289  7008

task3 <- task3 %>%
  mutate(ID = case_when(
    lrscale >= 0 & lrscale <= 4 ~ "Left",
    lrscale >= 6 & lrscale <= 10 ~ "Right",
    lrscale > 10 ~ NA_character_,
    TRUE ~ NA_character_
  ))

table(task3$ID)

## 
##  Left Right 
##  5605  7247

task3 <- task3 %>%
  mutate(
    
cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),

  gen = case_when(
    yrbrn %in% 1900:1945 ~ "1",
    yrbrn %in% 1946:1964 ~ "2",
    yrbrn %in% 1965:1979 ~ "3",
    yrbrn %in% 1980:1996 ~ "4",
    TRUE ~ as.character(yrbrn)  
    ),

gen = factor(gen,
  levels = c("1", "2", "3", "4"),
  labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials")),
)

table(task3$gen)

## 
##     Interwar Baby Boomers        Gen X  Millennials 
##         3967         6359         4684         2803

task3 <- task3 %>%
  mutate(behave = ipbhprp,
         secure = impsafe,
         safety = ipstrgv,
         tradition = imptrad,
         rules = ipfrule) %>%
  mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
                ~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%

  mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))

task3$auth <- scales::rescale(task3$behave + 
                      task3$secure + 
                      task3$safety + 
                      task3$tradition + 
                      task3$rules, to=c(0,100), na.rm=TRUE)

task3 <- task3 %>% filter(!is.na(auth))

task3 <- task3 %>%
  mutate(
    polID = case_when(
      lrscale %in% 0:3 ~ "Left",                    
      lrscale %in% 7:10 ~ "Right",                     
      lrscale %in% 4:6 ~ "Moderate",                  
      lrscale %in% c(77, 88, 99) ~ NA_character_      
    ),
   religious = case_when(
      rlgdgr %in% c(77, 88, 99) ~ NA_real_,
      TRUE ~ rlgdgr
    )
  )

Doing a modelsummary table, displaying both model outputs

model5 <- lm(auth ~ religion + ID + gen, data = task3, weights = weight)
model6 <- lm(auth ~ religion + ID + gen + religion*gen, data = task3, weights = weight)

modelsummary(
  list(model5, model6),
  fmt = 1,
  estimate  = c( "{estimate} ({std.error}){stars}",
                "{estimate} ({std.error}){stars}"),
  statistic = NULL,
  coef_omit = "Intercept")

	(1)	(2)
religionYes	8.2 (0.9)***	7.5 (2.1)***
IDRight	3.4 (0.9)***	3.4 (0.9)***
genBaby Boomers	-5.8 (1.3)***	-6.5 (1.7)***
genGen X	-7.3 (1.4)***	-7.6 (1.8)***
genMillennials	-8.6 (1.5)***	-8.6 (1.9)***
religionYes × genBaby Boomers		1.6 (2.6)
religionYes × genGen X		0.5 (2.8)
religionYes × genMillennials		-0.6 (3.2)
Num.Obs.	1147	1147
R2	0.126	0.127
R2 Adj.	0.123	0.121
AIC	9479.3	9484.5
BIC	9514.7	9535.0
Log.Lik.	-4732.669	-4732.251
RMSE	14.94	14.94

Interpreting the coefficients for the MLR without the interaction (the first model only)

religionYes (8.2): This coefficient suggests that, comparing individuals with the same political leaning, the average outcome for individuals with religions are 8.2 units higher compared to the reference category (in this case, identifying ideologically as on the ‘left’ and without religion).

IDRight (3.4): This coefficient indicates that, comparing individuals with the same religiousness, individuals identifying as politically right are expected to be, on average, 3.4 units higher than the reference category on the authoritarian values scale.

genBaby Boomers (-5.8): Compared to the reference category, Baby Boomers present scores that are, on average, 5.8 points lower on the authoritarian values scale (0-100).

genGen X (-7.3): Generation X shows an even more pronounced decrease, with scores 7.3 points lower than the reference category on the authoritarian values scale.

genMillennials (-8.6): Millennials exhibit the largest decrease in authoritarian values, with their scores being 8.6 points lower than the reference category. This marked difference is statistically significant, indicating that Millennials diverge most strongly from those who lean politically left in comparing their authoritarian values.

All the coefficients are significant with ***, indicating a p < 0.001. Meaning: There is less than a 0.1% probability that the observed result is due to chance and we incorrectly reject the null, under the frequentist assumptions.

Interpreting model fit metrics for both models

AIC: Because model5 has lower AIC value of 9479.3 compared to that of model6 (9484.5), model5 is the better fit for testing this dataset, compared to model6.

BIC: Again, because model5 has lower BIC value of 9514.7 compared to that of model3 (9535.0), model5 is the better fit for testing this dataset, compared to model6.

Log.Lik.: Because the log.lik value of model6 (-4732.251) is less negative than that of model5 (-4732.669), model6 is the better fit for testing this dataset, compared to model5, when considering the log.lik value only.

RMSE: Because both models have the same value of RMSE (14.94), I cannot determine which model is the better fit by considering the value of RMSE only.

Adjusted R-Squared: The adjusted R-squared values of 0.123 (for model5) and 0.121 (for model6) mean that the polID (moderate, left, and right) and religious variables (i.e., the four coded ideologies), and generation variables explain approximately 12.3% (for model5) and 12.1% (for model6) of the variance in the authoritarian values scale for Germany, after accounting for the number of predictors. While both models being statistically significant, this suggests that both models explain only a small portion of the variability in authoritarian values, pointing to other factors outside ideological differences that may play critical roles in shaping these attitudes. Also, because model5 covers slightly more than model6, taking other model fit metrics into consideration, model6 is the overall better model that fits for testing this dataset.

Task 4

Produce the model7 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

Producing the model7 interaction plot, using the Netharlands data

model7 <- lm(auth ~ + cohort + polID + cohort*polID, data = task3, weights = weight)
interact_plot(model7, pred = cohort, modx = polID, jnplot = TRUE)

Interpreting the model7 interaction plot

Across all coded political ideologies, the significant trend is that “the younger individuals are, the lower the authoritarian attitudes scale they possess.”

The graphs for leftists, rightists, and moderates are not parallel, indicating that there is a significant interaction between the left-right scale and year of birth of individuals.

Task 5

Produce the model8 interaction plot from the tutorial, but again using the Netherlands data. Interpret.

model8 <- lm(auth ~ religious + ID + religious*ID, data = task3, weights = weight)
interact_plot(model8, pred = religious, modx = ID, jnplot = TRUE)

Interpreting the model8 interaction plot

For both political ideologies, the significant trend is that “the higher the religiousness of individuals, the higher the authoritarian attitudes scales value.”

However, the graph for leftists and rightists are parallel, indicating that there is a little or no interaction between the left-right scale and the authoritarian score.

Kim_Eric_Homework_5

Seokang Eric Kim

2024-03-20

Initial set-up

Task 1

Loading data for Germany and recoding the variables

Doing a modelsummary table, displaying both model outputs

Interpreting the coefficients for the MLR without the interaction (the first model only)

Interpreting model fit metrics for both models

Task 2

Generating the model4 interaction plot, using the German data

Interpreting the model4 interaction plot

Task 3

Loading data for the Netherlands and recoding the variables

Doing a modelsummary table, displaying both model outputs

Interpreting the coefficients for the MLR without the interaction (the first model only)

Interpreting model fit metrics for both models

Task 4

Producing the model7 interaction plot, using the Netharlands data

Interpreting the model7 interaction plot

Task 5

Interpreting the model8 interaction plot

End.