Task 1 Using the Civil dataset, perform a simple linear regression with public_sector_corruption as the dependent variable and polyarchy as the independent variable. Visualize the relationship with a scatter plot and overlay the regression line. Use the sjPlot package to create regression tables and interpret the results.
load("Civil.RData")
Civil <- corruption
model <- lm(public_sector_corruption ~ polyarchy, data = Civil)
plot_corruption <- corruption %>%
mutate(highlight = polyarchy == min(polyarchy) | polyarchy == max(polyarchy))
ggplot(plot_corruption, aes(x = polyarchy, y = public_sector_corruption)) +
geom_point(aes(color = highlight)) +
stat_smooth(method = "lm", formula = y ~ x, linewidth = 1, color = "blue") +
geom_label_repel(data = filter(plot_corruption, highlight == TRUE),
aes(label = country_name), seed = 1234) +
scale_color_manual(values = c("grey30", "red"), guide = "none") +
labs(x = "Polyarchy Index", y = "Public Sector Corruption Index") +
theme_minimal()
tab_model(model)
| public_sector_corruption | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 89.44 | 81.64 – 97.25 | <0.001 |
| polyarchy | -0.83 | -0.96 – -0.69 | <0.001 |
| Observations | 168 | ||
| R2 / R2 adjusted | 0.472 / 0.469 | ||
Based on the regression table results, the intercept has an estimate of 89.44 with a p-value of less than 0.001, indicating that the baseline level of public sector corruption is significantly different from zero when polyarchy is zero. This suggests that in the absence of polyarchy, the expected level of public sector corruption is high, at approximately 89.44 units.
The variable polyarchy has an estimate of -0.83 with a p-value of less than 0.001, indicating that this coefficient is statistically significant. For each one-unit increase in the polyarchy score, public sector corruption decreases by 0.83 units. This negative relationship implies that as polyarchy (which often reflects democratic governance) increases, corruption in the public sector tends to decrease significantly.
The model’s R-squared value is 0.472, indicating that approximately 47.2% of the variance in public sector corruption is explained by the level of polyarchy. This suggests that polyarchy is an important predictor of corruption, though there is still a substantial amount of variance unexplained by this model. The adjusted R-squared value is 0.469, which accounts for the number of observations and indicates that approximately 46.9% of the variance in public sector corruption is explained by the model when taking into account the degrees of freedom.
Task 2 Extend the model from Task 1 by adding a quadratic term for polyarchy to capture potential non-linear relationships. Visualize the polynomial relationship using ggplot2. Calculate the marginal effects of polyarchy at different levels (30, 60, 90) using both manual calculations and the marginaleffects package. Interpret the results.
model_sq <- lm(public_sector_corruption ~ polyarchy + I(polyarchy^2), data = corruption)
ggplot(plot_corruption, aes(x = polyarchy, y = public_sector_corruption)) +
geom_point(aes(color = highlight)) +
stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 1, color = "blue") +
geom_label_repel(data = filter(plot_corruption, highlight == TRUE),
aes(label = country_name), seed = 1234) +
scale_color_manual(values = c("grey30", "red"), guide = "none") +
labs(x = "Polyarchy Index", y = "Public Sector Corruption Index") +
theme_minimal()
poly_lib1 <- coef(model_sq)["polyarchy"]
poly_lib2 <- coef(model_sq)["I(polyarchy^2)"]
poly_lib_slope <- function(x) poly_lib1 + (2 * poly_lib2 * x)
poly_lib_slope(c(30, 60, 90))
## [1] -0.06392759 -1.10250800 -2.14108840
model_sq %>%
slopes(newdata = datagrid(polyarchy = c(30, 60, 90)), eps = 0.001)
##
## Term polyarchy Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## polyarchy 30 -0.0639 0.141 -0.454 0.65 0.6 -0.34 0.212
## polyarchy 60 -1.1025 0.077 -14.325 <0.001 152.2 -1.25 -0.952
## polyarchy 90 -2.1411 0.227 -9.426 <0.001 67.7 -2.59 -1.696
##
## Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, polyarchy, predicted_lo, predicted_hi, predicted, public_sector_corruption
## Type: response
At a polyarchy value of 30, the estimate is -0.0639. This means that around this level of polyarchy, the relationship between polyarchy and the dependent variable is slightly negative, but not statistically significant.
At a polyarchy value of 60, the estimate is -1.1025. Here, polyarchy is associated with a significant decrease in the dependent variable, indicating a strong negative relationship.
At a polyarchy value of 90, the estimate is -2.1411, indicating an even stronger negative relationship at higher levels of polyarchy.
Task 3 Using the Civil dataset, fit a logistic regression model predicting the presence of campaign finance disclosure laws (disclose_donations) with public_sector_corruption and log_gdp_percapita as predictors. Use the sjPlot package to create regression tables and interpret the results.
model <- glm(disclose_donations ~ public_sector_corruption + log_gdp_percapita, data = Civil, family = binomial)
tab_model(model)
| disclose_donations | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.60 | 0.01 – 46.85 | 0.818 |
| public_sector_corruption | 0.94 | 0.92 – 0.96 | <0.001 |
|
GDP per capita (constant 2015 US$) |
1.28 | 0.84 – 1.98 | 0.253 |
| Observations | 168 | ||
| R2 Tjur | 0.454 | ||
The odds ratio of the intercept is 0.60, which is effectively very small and close to zero. This represents the baseline odds of having campaign finance disclosure laws when both public_sector_corruption and GDP per capita are at their reference levels (which is zero). However, since this scenario isn’t realistic (e.g., GDP per capita and corruption can’t realistically be zero), the intercept itself isn’t practically interpretable but serves as a baseline for the model.
The odds ratio for public_sector_corruption is 0.94. This means that for each one-unit increase in public sector corruption, the odds of having campaign finance disclosure laws decrease by 6%, holding GDP per capita constant. The negative relationship is statistically significant, indicating that higher levels of corruption are associated with a lower likelihood of having campaign finance disclosure laws.
The odds ratio for GDP per capita (constant 2015 US$) is 1.28. This indicates that for each one-unit increase in GDP per capita, the odds of having campaign finance disclosure laws increase by 28%, holding public sector corruption constant. However, this effect is not statistically significant, as the p-value is above 0.05.
The R² Tjur is 0.454, indicating that 45.4% of the variation in the presence of campaign finance disclosure laws is explained by this model.
Task 4 Calculate the marginal effects of public_sector_corruption from the logistic regression model in Task 3 at representative values (20, 50, 80).Use the marginaleffects and emmeans packages to compute these effects.Visualize the predicted probabilities of having campaign finance disclosure laws across a range of public_sector_corruption values using ggplot2.
#data_grid_me <- datagrid(model = model,
# public_sector_corruption = c(20, 50, 80))
marginal_effects <- model %>% slopes(newdata = datagrid(public_sector_corruption = c(20, 50, 80)), eps = 0.001, variables = "public_sector_corruption")
marginal_effects
##
## Term public_sector_corruption Estimate Std. Error z
## public_sector_corruption 20 -0.01422 0.002347 -6.06
## public_sector_corruption 50 -0.00973 0.001651 -5.90
## public_sector_corruption 80 -0.00237 0.000819 -2.89
## Pr(>|z|) S 2.5 % 97.5 %
## < 0.001 29.5 -0.01882 -0.009622
## < 0.001 28.0 -0.01297 -0.006500
## 0.00384 8.0 -0.00397 -0.000763
##
## Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, public_sector_corruption, predicted_lo, predicted_hi, predicted, log_gdp_percapita, disclose_donations
## Type: response
model %>%
emtrends(~ public_sector_corruption, var = "public_sector_corruption", at = list(public_sector_corruption = c(20, 50, 80)), delta.var = 0.001) %>%
test()
## public_sector_corruption public_sector_corruption.trend SE df z.ratio
## 20 -0.0596 0.0119 Inf -5.007
## 50 -0.0596 0.0119 Inf -5.007
## 80 -0.0596 0.0119 Inf -5.007
## p.value
## <.0001
## <.0001
## <.0001
clrs <- met.brewer("Johnson")
plot_corruption_logit <- Civil |>
mutate(highlight = public_sector_corruption == min(public_sector_corruption) |
public_sector_corruption == max(public_sector_corruption))
theme_mfx <- function() {
theme_minimal(base_family = "IBM Plex Sans Condensed") +
theme(panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
plot.title = element_text(face = "bold"),
axis.title = element_text(face = "bold"),
strip.text = element_text(face = "bold"),
strip.background = element_rect(fill = "grey80", color = NA),
legend.title = element_text(face = "bold"))
}
ggplot(plot_corruption_logit,
aes(x = public_sector_corruption, y = as.numeric(disclose_donations))) +
geom_point(aes(color = highlight)) +
geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit")),
color = clrs[2]) +
geom_label(data = slice(filter(plot_corruption_logit, highlight == TRUE), 1),
aes(label = country_name), nudge_y = 0.06, hjust = 1) +
geom_label(data = slice(filter(plot_corruption_logit, highlight == TRUE), 2),
aes(label = country_name), nudge_y = -0.06, hjust = 0) +
scale_color_manual(values = c("grey30", clrs[3]), guide = "none") +
labs(x = "Public sector corruption",
y = "Presence or absence of\ncampaign finance disclosure laws") + theme_mfx()
Task 5 Explore the interaction effect between public_sector_corruption and region in the logistic regression model from Task 3.
Use the datagrid() function from the marginaleffects package to create a dataset with representative values for regions.
Fit the logistic regression model with the interaction term and visualize the interaction effects using ggplot2.
Interpret the results and discuss the implications of the interaction effect.
model <- glm(disclose_donations ~ public_sector_corruption * region + log_gdp_percapita, data = Civil, family = binomial)
regions_to_use <- c("Western Europe and North America",
"Latin America and the Caribbean",
"Middle East and North Africa")
data_grid <- datagrid(public_sector_corruption = c(20, 80),
region = regions_to_use,
model = model)
print(data_grid)
## log_gdp_percapita public_sector_corruption region
## 1 8.567353 20 Western Europe and North America
## 2 8.567353 20 Latin America and the Caribbean
## 3 8.567353 20 Middle East and North Africa
## 4 8.567353 80 Western Europe and North America
## 5 8.567353 80 Latin America and the Caribbean
## 6 8.567353 80 Middle East and North Africa
## rowid
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
logit_predictions <- model |>
emmeans(~ public_sector_corruption + region,
at = list(public_sector_corruption = seq(0, 90, 1)),
type = "response") |>
as_tibble()
logit_slopes <- model |>
emtrends(~ public_sector_corruption + region, var = "public_sector_corruption",
at = list(public_sector_corruption = c(20, 80),
region = regions_to_use),
regrid = "response", delta.var = 0.001) |>
as_tibble() |>
mutate(panel = glue("Corruption set to {public_sector_corruption}"))
logit_slopes
## # A tibble: 6 × 8
## public_sector_corruption region public_sector_corrup…¹ SE df asymp.LCL
## <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 20 Weste… -0.0250 2.14e-2 Inf -0.0670
## 2 80 Weste… -0.000295 1.37e-3 Inf -0.00298
## 3 20 Latin… -0.00956 5.21e-3 Inf -0.0198
## 4 80 Latin… -0.00345 1.45e-3 Inf -0.00629
## 5 20 Middl… -0.0286 1.36e-2 Inf -0.0552
## 6 80 Middl… -0.000141 4.05e-4 Inf -0.000935
## # ℹ abbreviated name: ¹public_sector_corruption.trend
## # ℹ 2 more variables: asymp.UCL <dbl>, panel <glue>
find_intercept <- function(x1, y1, slope) {
intercept <- slope * (-x1) + y1
return(intercept)
}
slopes_to_plot <- logit_predictions |>
filter(public_sector_corruption %in% c(20, 80),
region %in% regions_to_use) |>
left_join(dplyr::select(logit_slopes, public_sector_corruption, region, public_sector_corruption.trend, panel),
by = c("public_sector_corruption", "region")) |>
mutate(intercept = find_intercept(public_sector_corruption, prob, public_sector_corruption.trend)) |>
mutate(round_slope = scales::label_number(accuracy = 0.001, style_negative = "minus")(public_sector_corruption.trend * 100),
nice_slope = glue::glue("Slope:{round_slope} pct pts"))
slopes_to_plot
## # A tibble: 6 × 12
## public_sector_corruption region prob SE df asymp.LCL asymp.UCL
## <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 20 Latin Amer… 0.526 0.158 Inf 2.44e-1 0.793
## 2 80 Latin Amer… 0.100 0.0921 Inf 1.48e-2 0.452
## 3 20 Middle Eas… 0.584 0.265 Inf 1.42e-1 0.922
## 4 80 Middle Eas… 0.00120 0.00407 Inf 1.55e-6 0.482
## 5 20 Western Eu… 0.566 0.282 Inf 1.21e-1 0.925
## 6 80 Western Eu… 0.00292 0.0157 Inf 7.31e-8 0.992
## # ℹ 5 more variables: public_sector_corruption.trend <dbl>, panel <glue>,
## # intercept <dbl>, round_slope <chr>, nice_slope <glue>
ggplot(logit_predictions, aes(x = public_sector_corruption, y = prob, color = region)) +
geom_line(linewidth = 1) +
geom_point(data = slopes_to_plot, size = 2, show.legend = FALSE) +
geom_abline(data = slopes_to_plot,
aes(slope = public_sector_corruption.trend, intercept = intercept, color = region),
linewidth = 0.5, linetype = "21", show.legend = FALSE) +
geom_label_repel(data = slopes_to_plot, aes(label = nice_slope),
fontface = "bold", seed = 123, show.legend = FALSE,
size = 3, direction = "y",nudge_y = 0.5,
nudge_x = 5) +
labs(x = "Public sector corruption",
y = "Predicted probability of having\na campaign finance disclosure law",
color = NULL) +
theme_minimal() +
theme(legend.position = "bottom")
In both the Middle East and Western Europe/North America, an increase in
public sector corruption in countries with low levels of corruption (20)
is associated with a −2.860 and −2.497 percentage point decrease in the
probability of seeing a disclosure law, while in low-corruption
countries in Latin America, an increase in public sector corruption do
less to the probability (it decreases it by -0.956 percentage points) In
countries with high levels of corruption (80), on the other hand, a
small increase in corruption doesn’t do much to the probability of
having a disclosure law in the Middle East (−0.014 percentage point
decrease) or Western Europe (−0.030 percentage point decrease). In Latin
America, though, a small increase in corruption is associated with a
−0.345 percentage point decrease in the probability of having a
disclosure law.
ggplot(logit_slopes, aes(x = public_sector_corruption.trend * 100, y = region, color = region)) +
geom_vline(xintercept = 0, linewidth = 0.5, linetype = "24", color = clrs[5]) +
geom_pointrange(aes(xmin = asymp.LCL * 100, xmax = asymp.UCL * 100)) +
scale_color_manual(values = c(clrs[4], clrs[1], clrs[2]), guide = "none") +
labs(x = "Marginal effect (percentage points)", y = NULL) +
facet_wrap(vars(panel), ncol = 1) +
theme_minimal()
The graph shows the marginal effects of public sector corruption on the probability of having a campaign finance disclosure law across three regions: Middle East and North Africa, Latin America and the Caribbean, and Western Europe and North America. The marginal effects are calculated for two levels of public sector corruption: 20 (low) and 80 (high). The horizontal axis represents the marginal effect (in percentage points), and the vertical axis lists the regions.
Corruption Set to 20: Middle East and North Africa: The marginal effect is negative, and the confidence interval does not include zero, indicating that the effect is statistically significant. This suggests that an increase in public sector corruption is associated with a significant decrease in the probability of having a disclosure law.
Latin America and the Caribbean: The marginal effect is negative, but the confidence interval includes zero, indicating that this effect is not statistically significant. This suggests that the relationship between increased corruption and the probability of having a disclosure law is not strongly supported by the data.
Western Europe and North America: The marginal effect is negative, but the confidence interval includes zero, indicating no significant effect. This implies that the data does not strongly support a relationship between increased corruption and the probability of having a disclosure law.
Corruption Set to 80: Middle East and North Africa: The marginal effect is negative, but the confidence interval includes zero, indicating that this effect is not statistically significant. This suggests that at higher levels of corruption, further increases in corruption do not significantly affect the probability of having a disclosure law.
Latin America and the Caribbean: The marginal effect is negative, and the confidence interval does not include zero, indicating that the effect is statistically significant. This suggests that an increase in public sector corruption is associated with a significant decrease in the probability of having a disclosure law.
Western Europe and North America: The marginal effect is negative, but the confidence interval includes zero, indicating no significant effect. This implies that at higher levels of corruption, further increases in corruption do not significantly affect the probability of having a disclosure law.
The results show that the significance and direction of the marginal effects of public sector corruption on the probability of having a campaign finance disclosure law vary by region and corruption level. In particular, only the Middle East and North Africa at low corruption levels and Latin America and the Caribbean at high corruption levels show statistically significant negative effects, where increased corruption is associated with a lower probability of having a disclosure law. In other regions or at different corruption levels, the effects are not statistically significa