##Task 1 Use data for Germany and the variable coding for model3 and model4 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.
Germany<-read_fst("/Users/jessie/germany_data.fst")
df<-Germany
df$weight <- df$dweight * df$pweight
survey_design <- svydesign(ids = ~1, data = df, weights = ~weight)
Setting up our populist scale
# # Setting values greater than 10 to NA
# df$trstplt <- ifelse(df$trstplt > 10, NA, df$trstplt)
# df$trstprl <- ifelse(df$trstprl > 10, NA, df$trstprl)
# df$trstprt <- ifelse(df$trstprt > 10, NA, df$trstprt)
#
# # Creating and rescaling the trust variable
# df$trust <- scales::rescale(df$trstplt + df$trstprl + df$trstprt, na.rm = TRUE, to = c(0, 100))
#
# # Here's how you would "flip it" because populist is conceived by N + I as 'anti-trust'
# # As Schafer explains there are some issues with that conceptualization, but N + I is also widely applied
# df$populist <- scales::rescale(df$trust, na.rm = TRUE, to=c(100,0))
Recode
# df <- df %>%
# mutate(
# cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),
# # Recoding generational cohorts based on the year of birth (yrbrn).
# # The year of birth is categorized into different generational cohorts.
# # Interwar (1900-1945), Baby Boomers (1946-1964), Gen X (1965-1979), Millennials (1980-1996).
# # The 'TRUE' line is a catch-all that keeps the original year of birth for those not in these ranges.
# gen = case_when(
# yrbrn %in% 1900:1945 ~ "1",
# yrbrn %in% 1946:1964 ~ "2",
# yrbrn %in% 1965:1979 ~ "3",
# yrbrn %in% 1980:1996 ~ "4",
# TRUE ~ as.character(yrbrn)
# ),
# # After recoding, the gen variable is converted into a factor with labels for clearer interpretation.
# # Factors are used in R to handle categorical variables.
# gen = factor(gen,
# levels = c("1", "2", "3", "4"),
# labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials"))
# )
# table(df$gen)
# ##
# ## Interwar Baby Boomers Gen X Millennials
# ## 3999 6351 4821 3308
# # The dataframe is further updated to recode trust and political interest # note: you could have done it one, integrating above, but I broke it down
# df <- df %>%
# mutate(
# # Recoding the 'ppltrst' variable, which seems to measure trust on a scale from 0 to 10.
# # Special codes like 77, 88, 99 are probably used for missing data and are set to NA.
# # Trust levels are categorized as 'Low', 'Mid', and 'High'.
# Capital = case_when(
# ppltrst %in% c(77, 88, 99) ~ NA_character_,
# ppltrst >= 0 & ppltrst <= 3 ~ "Low",
# ppltrst >= 4 & ppltrst <= 6 ~ "Mid",
# ppltrst >= 7 & ppltrst <= 10 ~ "High",
# TRUE ~ as.character(ppltrst)
# ),
# # Recoding 'polintr' variable, which seems to measure political interest.
# # Values of 1 and 2 indicate 'Interested', 3 and 4 'Not Interested'.
# # Special codes like 7, 8, 9 (likely missing data) are set to NA.
# Politically = case_when(
# polintr %in% c(1, 2) ~ "Interested",
# polintr %in% c(3, 4) ~ "Not Interested",
# polintr %in% c(7, 8, 9) ~ NA_character_
# )
# )
#
df <- df %>%
mutate(behave = ipbhprp,
secure = impsafe,
safety = ipstrgv,
tradition = imptrad,
rules = ipfrule) %>%
mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
# Apply the reverse coding
mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))
# Now you can calculate 'schwartzauth' after the NA recoding
df$auth <- scales::rescale(df$behave +
df$secure +
df$safety +
df$tradition +
df$rules, to=c(0,100), na.rm=TRUE)
df <- df %>% filter(!is.na(auth))
df <- df %>%
mutate(
polID = case_when(
lrscale %in% 0:3 ~ "Left",
lrscale %in% 7:10 ~ "Right",
lrscale %in% 4:6 ~ "Moderate",
lrscale %in% c(77, 88, 99) ~ NA_character_
),
religious = case_when(
rlgdgr %in% c(77, 88, 99) ~ NA_real_,
TRUE ~ rlgdgr
)
)
model3 <- lm(auth ~ polID + religious, data = df, weights = weight)
model4 <- lm(auth ~ polID + religious + polID*religious, data = df, weights = weight)
modelsummary(
list(model3, model4),
fmt = 1,
estimate = c( "{estimate} ({std.error}){stars}",
"{estimate} ({std.error}){stars}"),
statistic = NULL)
| (1) | (2) | |
|---|---|---|
| (Intercept) | 55.3 (0.3)*** | 56.4 (0.3)*** |
| polIDModerate | 4.6 (0.3)*** | 3.4 (0.4)*** |
| polIDRight | 7.7 (0.4)*** | 4.2 (0.7)*** |
| religious | 0.9 (0.0)*** | 0.6 (0.1)*** |
| polIDModerate × religious | 0.3 (0.1)*** | |
| polIDRight × religious | 0.8 (0.1)*** | |
| Num.Obs. | 23373 | 23373 |
| R2 | 0.048 | 0.050 |
| R2 Adj. | 0.048 | 0.050 |
| AIC | 200594.1 | 200558.6 |
| BIC | 200634.4 | 200615.0 |
| Log.Lik. | −100292.049 | −100272.298 |
| RMSE | 17.27 | 17.24 |
Interpretation: polIDModerate (4.6): This coefficient indicates that when we control for religiousness, individuals identifying as politically moderate have an average outcome that is 4.6 units higher compared to those who identify ideologically as on the ‘left’, which serves as our reference category. It’s important to note that the “baseline group” is the category that’s omitted, and consequently, it’s interpreted as part of our intercept.
polIDRight (7.7): This coefficient suggests that when holding religiousness constant, individuals who identify as politically right are anticipated to be, on average, 7.7 units higher on the authoritarian values scale compared to the reference category, which in our case is those identifying ideologically as on the ‘left’. It’s worth noting that the reference category, often termed the “baseline group,” is omitted and thus forms a part of our intercept.
religious (0.9): With the predictor ‘religious’, a coefficient of 0.9 suggests that for every one-unit increase in religiousness, there is an associated expected average increase of 0.9 units on the authoritarian values scale, assuming we are comparing individuals with the same ideological categorization, effectively holding ideological self-placement constant.
All three coefficients have ***, indicating a p < 0.001. Meaning: There is less than a 0.1% probability that the observed result is due to chance and we incorrectly reject the null, under frequentist assumptions.
##Task 2 Now generate the model4 interaction plot that we did in the tutorial, but again using the German data instead of the French. Interpret.
interaction_plot <- effect("polID*religious", model4, na.rm=TRUE)
plot(interaction_plot,
main="Interaction effect",
xlab="Religiousness",
ylab="Authoritarian attitudes scale")
interaction_plot
##
## polID*religious effect
## religious
## polID 0 2 5 8 10
## Left 56.40706 57.56775 59.30879 61.04983 62.21053
## Moderate 59.85063 61.66176 64.37844 67.09513 68.90626
## Right 60.60237 63.35032 67.47224 71.59416 74.34211
##Task 3 Use data for the Netherlands and the variable coding for model5 and model6 from the tutorial (i.e., both for the cleaning & recode, as well as the linear model formulas). Do a modelsummary table displaying both model outputs. Interpret the coefficients for the MLR without the interaction (i.e., the first displayed model), and interpret model fit metrics for both models.
netherland<-read.fst("/Users/jessie/netherlands_data.fst")
df<-netherland
df <- df %>%
mutate(religion = case_when(
rlgblg == 2 ~ "No",
rlgblg == 1 ~ "Yes",
rlgblg %in% c(7, 8, 9) ~ NA_character_,
TRUE ~ as.character(rlgblg)
))
df <- df %>%
mutate(ID = case_when(
lrscale >= 0 & lrscale <= 4 ~ "Left",
lrscale >= 6 & lrscale <= 10 ~ "Right",
lrscale > 10 ~ NA_character_, # Set values above 10 as NA
TRUE ~ NA_character_ # Ensure value 5 and any other unexpected values are set as NA
))
df <- df %>%
mutate(
cohort = ifelse(yrbrn < 1930 | yrbrn > 2000, NA, yrbrn),
# Recoding generational cohorts based on the year of birth (yrbrn).
# The year of birth is categorized into different generational cohorts.
# Interwar (1900-1945), Baby Boomers (1946-1964), Gen X (1965-1979), Millennials (1980-1996).
# The 'TRUE' line is a catch-all that keeps the original year of birth for those not in these ranges.
gen = case_when(
yrbrn %in% 1900:1945 ~ "1",
yrbrn %in% 1946:1964 ~ "2",
yrbrn %in% 1965:1979 ~ "3",
yrbrn %in% 1980:1996 ~ "4",
TRUE ~ as.character(yrbrn)
),
# After recoding, the gen variable is converted into a factor with labels for clearer interpretation.
# Factors are used in R to handle categorical variables.
gen = factor(gen,
levels = c("1", "2", "3", "4"),
labels = c("Interwar", "Baby Boomers", "Gen X", "Millennials"))
)
df <- df %>%
mutate(behave = ipbhprp,
secure = impsafe,
safety = ipstrgv,
tradition = imptrad,
rules = ipfrule) %>%
mutate(across(c("behave", "secure", "safety", "tradition", "rules"),
~ na_if(.x, 7) %>% na_if(8) %>% na_if(9))) %>%
# Apply the reverse coding
mutate(across(c("behave", "secure", "safety", "tradition", "rules"), ~ 7 - .x ))
df$auth <- scales::rescale(df$behave +
df$secure +
df$safety +
df$tradition +
df$rules, to=c(0,100), na.rm=TRUE)
df <- df %>% filter(!is.na(auth))
model5 <- lm(auth ~ religion + ID + gen, data = df, weights = weight)
model6 <- lm(auth ~ religion + ID + gen + religion*gen, data = df, weights = weight)
modelsummary(
list(model5, model6),
fmt = 1,
estimate = c( "{estimate} ({std.error}){stars}",
"{estimate} ({std.error}){stars}"),
statistic = NULL,
coef_omit = "Intercept")
| (1) | (2) | |
|---|---|---|
| religionYes | 8.2 (0.9)*** | 7.5 (2.1)*** |
| IDRight | 3.4 (0.9)*** | 3.4 (0.9)*** |
| genBaby Boomers | −5.8 (1.3)*** | −6.5 (1.7)*** |
| genGen X | −7.3 (1.4)*** | −7.6 (1.8)*** |
| genMillennials | −8.6 (1.5)*** | −8.6 (1.9)*** |
| religionYes × genBaby Boomers | 1.6 (2.6) | |
| religionYes × genGen X | 0.5 (2.8) | |
| religionYes × genMillennials | −0.6 (3.2) | |
| Num.Obs. | 1147 | 1147 |
| R2 | 0.126 | 0.127 |
| R2 Adj. | 0.123 | 0.121 |
| AIC | 9479.3 | 9484.5 |
| BIC | 9514.7 | 9535.0 |
| Log.Lik. | −4732.669 | −4732.251 |
| RMSE | 14.94 | 14.94 |
Interpretation: What stands out here is that the R-squared and adjusted R-squared values are notably higher for the second model. In other words, the first model doesn’t account for much variance in the outcome. However, it’s important to note that the AIC and BIC are higher for the second model, indicating that the increase in explained variance isn’t justified. This suggests there might be greater complexity or underlying issues that the AIC and BIC are cautious about, comparatively speaking. In summary, it indicates potential issues with the second model.
religionYes (8.2): This coefficient indicates that, while controlling for religiousness, individuals who affirm religious affiliation are expected to have an average outcome 8.2 units higher compared to the reference category (in our case, those not affirming religious affiliation), on the measured variable. It’s essential to note that the “baseline group” is the category omitted from the analysis and is therefore interpreted as part of our intercept.
polIDRight (3.4): This coefficient suggests that, when controlling for ideological self-identification, individuals identifying as politically right are anticipated to be, on average, 3.4 units higher on the authoritarian values scale compared to the reference category, which consists of those identifying ideologically as on the ‘left’. It’s important to note that the reference category, often termed the “baseline group,” is omitted and thus forms a part of our intercept.
genBaby Boomers (-5.8): This coefficient suggests that, when considering generational cohort, individuals belonging to the Baby Boomers generation are expected to have an average outcome that is 5.8 units lower compared to the reference category, assuming all other variables are held constant.
genMillennials (-8.6): This coefficient suggests that, in comparison to the reference category (which could be another generational cohort), individuals belonging to the Millennial generation are anticipated to have an average outcome that is 8.6 units lower, assuming all other variables remain constant.
interaction_plot <- effect("religion*gen", model6, na.rm=TRUE)
plot(interaction_plot,
main="Interaction effect",
xlab="Generations",
ylab="Authoritarian attitudes scale")
cat_plot(model6, pred = gen, modx = religion, jnplot = TRUE)
##Task 4 Produce the model7 interaction plot from the tutorial, but again using the Netherlands data. Interpret.
df <- df %>%
mutate(
polID = case_when(
lrscale %in% 0:3 ~ "Left",
lrscale %in% 7:10 ~ "Right",
lrscale %in% 4:6 ~ "Moderate",
lrscale %in% c(77, 88, 99) ~ NA_character_
),
religious = case_when(
rlgdgr %in% c(77, 88, 99) ~ NA_real_,
TRUE ~ rlgdgr
)
)
model7 <- lm(auth ~ + cohort + polID + cohort*polID, data = df, weights = weight)
interact_plot(model7, pred = cohort, modx = polID, jnplot = TRUE)
Interpretation: the graph suggests a generational change in
authoritarian attitudes among those who identify as Moderate or Right,
with younger cohorts expressing less authoritarianism than older ones.
In contrast, Left-leaning individuals maintain consistent levels of
authoritarianism across different generations. This kind of trend
analysis is valuable in sociopolitical research to understand how
attitudes evolve over time within different political spectrums.
##Task 5 Produce the model8 interaction plot from the tutorial, but again using the Netherlands data. Interpret.
model8 <- lm(auth ~ religious + ID + religious*ID, data = df, weights = weight)
interact_plot(model8, pred = religious, modx = ID, jnplot = TRUE)
Interpretation: the plot suggests that in the Netherlands, both Left-
and Right-oriented individuals exhibit a positive correlation between
religiosity and authoritarianism, but the strength of this correlation
is significantly stronger on the Right. This kind of information could
be useful for sociological research, political analysis, or
policymaking, as it reflects how different societal groups may respond
to religious contexts within a framework of authoritarian attitudes.