Problem Set 6

Instructions

Our final problem set

Task 1

Using the Civil dataset, perform a simple linear regression with public_sector_corruption as the dependent variable and polyarchy as the independent variable.
Visualize the relationship with a scatter plot and overlay the regression line. Use the sjPlot package to create regression tables and interpret the results.

Task 2

Extend the model from Task 1 by adding a quadratic term for polyarchy to capture potential non-linear relationships.
Visualize the polynomial relationship using ggplot2.
Calculate the marginal effects of polyarchy at different levels (30, 60, 90) using both manual calculations and the marginaleffects package. Interpret the results.

Task 3

Using the Civil dataset, fit a logistic regression model predicting the presence of campaign finance disclosure laws (disclose_donations) with public_sector_corruption and log_gdp_percapita as predictors.
Use the sjPlot package to create regression tables and interpret the results.

Task 4

Calculate the marginal effects of public_sector_corruption from the logistic regression model in Task 3 at representative values (20, 50, 80). Use the marginaleffects and emmeans packages to compute these effects.
Visualize the predicted probabilities of having campaign finance disclosure laws across a range of public_sector_corruption values using ggplot2.

Task 5

Explore the interaction effect between public_sector_corruption and region in the logistic regression model from Task 3. Use the datagrid() function from the marginaleffects package to create a dataset with representative values for regions.
Fit the logistic regression model with the interaction term and visualize the interaction effects using ggplot2. Interpret the results and discuss the implications of the interaction effect.

Setup

We’ll start, as always, by loading up our required packages and the “Civil” dataset that we will be working with. This dataset “contains information about civil liberties, public sector corruption, and other political and economic factors across different countries”.

Task 1

First, we perform a simple linear regression. Our dependent variable will be “public_sector_corruption” and our independent variable will be “polyarchy”. We’ll visualize this model using a scatter plot with a regression line and we’ll also construct a regression table with sjPlot so we can interpret results.

# First we craft our simple linear regression model
slrmodel <- lm(public_sector_corruption ~ polyarchy, data = corruption)

# Now to visualize the model and regression line
ggplot(Civil, aes(x = polyarchy, y = public_sector_corruption)) +
  geom_point() +
  geom_smooth(method = "lm", col = "deepskyblue4") +
  labs(title = "Scatter Plot of Polyarchy VS Public Sector Corruption\nIncluding Regression Line",
       x = "Polyarchy",
       y = "Public Sector Corruption Index") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

## `geom_smooth()` using formula = 'y ~ x'

# Finally, the regression table with sjplot's tab_model command
tab_model(slrmodel, title = "SLR Model: Polyarchy vs Public Sector Corruption")

SLR Model: Polyarchy vs Public Sector Corruption
	public_sector_corruption
Predictors	Estimates	CI	p
(Intercept)	89.44	81.64 – 97.25	<0.001
polyarchy	-0.83	-0.96 – -0.69	<0.001
Observations	168
R² / R² adjusted	0.472 / 0.469

Interpretation of Regression Table

Our model’s intercept of 89.44 tells us, hypothetically, that if a country’s polyarchy was set to a value of 0, their Public Sector Corruption index would be 89.44. Our results suggest the relationship between public sector corruption and polyarchy is a negative, such that, each 1 unit increase in a country’s polyarchy, decreases it’s public sector corruption by 0.83 units. This effect is statistically significant at a P-Value of less than 0.001, and an R-Squared of 0.472 suggests that 47.2% of a country’s public sector corruption is explained by it’s polyarchy.

Task 2

We need to add a quadratic term for polyarchy into our previous model then visualize, then calculate the marginal effects of polyarchy at levels 30 60 and 90 and finally, interpret the results

# Adding a quadratic term
slrmodelQ <- lm(public_sector_corruption ~ polyarchy + I(polyarchy^2), data = corruption)

# Visualizing 
ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
  geom_point(color = "grey30") +  
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 1, color = "darkorchid1") +
  labs(x = "Polyarchy", y = "Public Sector Corruption Index") +
  theme_minimal()

# Now to calculate marginal effects. The coefficient of polyarchy represents the marginal effect.
poly1 <- coef(slrmodelQ)["polyarchy"]
poly2 <- coef(slrmodelQ)["I(polyarchy^2)"]

poly_slope <- function(x) poly1 + (2 * poly2 * x)

poly_slope(c(30, 60, 90))

## [1] -0.06392759 -1.10250800 -2.14108840

# We can also calculate using the marginaleffects package
slrmodelQ %>% 
  slopes(newdata = datagrid(polyarchy = c(30, 60, 90)), eps = 0.001)

## 
##       Term polyarchy Estimate Std. Error       z Pr(>|z|)     S 2.5 % 97.5 %
##  polyarchy        30  -0.0639      0.141  -0.454     0.65   0.6 -0.34  0.212
##  polyarchy        60  -1.1025      0.077 -14.325   <0.001 152.2 -1.25 -0.952
##  polyarchy        90  -2.1411      0.227  -9.426   <0.001  67.7 -2.59 -1.696
## 
## Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, polyarchy, predicted_lo, predicted_hi, predicted, public_sector_corruption 
## Type:  response

Interpretation

polyarchy of 30: At this level, the marginal effect is -0.064. Essentially, at a polyarchy value 30, increasing polyarchy is slightly assosciated with a decrease in public sector corruption.

polyarchy of 60: A marginal effect of -1.103 suggests that, at a polyarchy value of 60, increasing polyarchy is assosciated with a greater decrease in public sector corruption than at a value of 30.

polyarchy of 90: A marginal effect of -2.141 suggest that, at a polyarchy value of 90, we have the strongest negative relationship between polyarchy and public sector corruption. Considering that our marginal effects just decreased more and more as the levels got higher, it is safe to say that the negative relationships between the 2 variables are stronger at higher levels of polyarchy.

marginaleffects package: For every specified level (30, 60, 90), this package provides, along with the information above, the standard errors, the z-statistics, P-values, and confidence intervals.

Task 3

Here we must fit a logistic regression model with the dependent variable being “disclose_donations” and the independent variables being “public_sector_corruption” and “log_gdp_percapita”. After that we’ll make a regression table to interpret results.

# First to fit our model

lrmodel <- glm(disclose_donations ~ public_sector_corruption + log_gdp_percapita, family = binomial(link = "logit"), data = corruption)

# Now our regressiont able with sjplot

tab_model(lrmodel,
          dv.labels = "Presence of Campaign Finance Disclosure Laws")

	Presence of Campaign Finance Disclosure Laws
Predictors	Odds Ratios	CI	p
(Intercept)	0.60	0.01 – 46.85	0.818
public_sector_corruption	0.94	0.92 – 0.96	<0.001
GDP per capita (constant 2015 US$)	1.28	0.84 – 1.98	0.253
Observations	168
R² Tjur	0.454

Interpretation Intercept: An odds ratio 0.60 suggests that campaign finance disclosure laws are 40% less likely to be present if public sector corruption and gdp per capita were hypothetically set to 0. This helps serve as a baseline for our model.

Public Sector Corruption: For each 1 unit increase in public sector corruption, the odds of a presence of campaign finance disclosure laws is 0.94 times as likely, meaning they decrease by 6%. This effect is statistically significant at a p-value of less than 0.001.

GDP per Capita: For each 1 unit increase in gdp per capita, the odds of campaign finance disclosure laws being present increase by a factor of 1.28, meaning, they increase by 28%. This effect is not statistically significant though, with a p-value of 0.253.

Tjur’s R-Squared: The value of this is 0.454, meaning, 45.4% of the variation in presence of campaign finance disclosure laws is explained by a combination of public sector corruption and gdp per capita.

Takeaway: While we cannot confidently say that countries with a higher gdp are more likely to have laws enforcing the disclosure of campaign finances due to the effect not being statistically significant, our regression table shows with statistic significance that countries with higher public sector corruption are 6% less likely to have these laws in place. These 2 variables explain a good 45.4% of the variation in presence of campaign finance disclosure laws, but perhaps more significant variables could be explored in the future so that we may better understand what influences a country to disclose their campaign finances.

Task 4

Using the previous model, we will calculate the marginal effects of public_sector_corruption at representative values 20, 50, 80. We’ll use marginaleffects and emmeans packages for this. Then, we will visualize the predicted probabilities of campaign finance disclosure laws presence across a range of public_sector_corruption values using ggplot2.

# Using marginaleffects package to calculate marginal effects
lrmodel %>%
  slopes(newdata = datagrid(public_sector_corruption = c(20, 50, 80)), eps = 0.001)

## 
##                      Term public_sector_corruption Estimate Std. Error      z
##  log_gdp_percapita                              20  0.05939   0.054017  1.100
##  log_gdp_percapita                              50  0.04066   0.037102  1.096
##  log_gdp_percapita                              80  0.00989   0.011759  0.841
##  public_sector_corruption                       20 -0.01422   0.002347 -6.061
##  public_sector_corruption                       50 -0.00973   0.001651 -5.898
##  public_sector_corruption                       80 -0.00237   0.000819 -2.891
##  Pr(>|z|)    S    2.5 %    97.5 %
##   0.27154  1.9 -0.04648  0.165265
##   0.27316  1.9 -0.03206  0.113375
##   0.40037  1.3 -0.01316  0.032935
##   < 0.001 29.5 -0.01882 -0.009622
##   < 0.001 28.0 -0.01297 -0.006500
##   0.00384  8.0 -0.00397 -0.000763
## 
## Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, public_sector_corruption, predicted_lo, predicted_hi, predicted, log_gdp_percapita, disclose_donations 
## Type:  response

# Using emmeans package
lrmodel %>%
  emtrends(~ public_sector_corruption, var = "public_sector_corruption", at = list(public_sector_corruption = c(20, 50, 80)), delta.var = 0.001) %>%
  test()

##  public_sector_corruption public_sector_corruption.trend     SE  df z.ratio
##                        20                        -0.0596 0.0119 Inf  -5.007
##                        50                        -0.0596 0.0119 Inf  -5.007
##                        80                        -0.0596 0.0119 Inf  -5.007
##  p.value
##   <.0001
##   <.0001
##   <.0001

# We can visualize these marginal effects by plotting predicted probabilities of campaign finance disclosure laws across a range of public sector corruption. First lets generate predictions for some range of public sector corruption.

logit_predictions <- lrmodel |> 
  emmeans(~ public_sector_corruption, var = "public_sector_corruption",
          at = list(public_sector_corruption = seq(0, 90, 1)),
          regrid = "response") |> 
  as_tibble()

# Now to plot predicted probabilities

ggplot(logit_predictions, aes(x = public_sector_corruption, y = prob)) +
  geom_line(linewidth = 1) +
  labs(x = "Public sector corruption", y = "Predicted probability of having\na campaign finance disclosure law", color = NULL) +
  theme_minimal() +
  theme(legend.position = "bottom")

Task 5

We will explore the interaction effect between public_sector_corruption and region in the same model by using the marginaleffects package to create a dataset with representative values for regions. We will fit the logistic regression model with the interaction term and visualize the interaction effects with ggplot2. Finally, we’ll interpret the results and discuss implications.

# Before anything we fit the logistic regression model with interaction term
lrmodel2 <- glm(disclose_donations ~ log_gdp_percapita + public_sector_corruption * region,
                data = corruption, 
                family = binomial(link = "logit"))

# Now, defining representative values for the regions
regions_to_use <- c("Western Europe and North America", 
                    "Latin America and the Caribbean",
                    "Middle East and North Africa")

#  We now make a dataset with datagrid()
new_data <- datagrid(model = lrmodel2,
                     public_sector_corruption = seq(20, 80, by = 5),  
                     region = regions_to_use,
                     log_gdp_percapita = mean(corruption$log_gdp_percapita))

# Calculate predicted probabilities to later plot
new_data$predicted_prob <- predict(lrmodel2, newdata = new_data, type = "response")

# Visualizing our new and improved model with ggplot2
ggplot(new_data, aes(x = public_sector_corruption, y = predicted_prob, color = region)) +
  geom_line(linewidth = 1) +
  labs(x = "Public Sector Corruption Index", 
       y = "Predicted Probability of Campaign Finance Disclosure Laws", 
       title = "Interaction Effect\nof Public Sector Corruption and Region\non Campaign Finance Disclosure Laws") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

When plotting the marginal effects of the interaction between public sector corruption and region we notice a general trend and an interesting interaction between the 2 variables.

First and foremost, the general trend is that all regions are predicted to have a lower probability of campaign finance disclosure laws at higher public sector corruption indexes. We can say there is a negative relationship between public sector corruption index and predicted probability of having campaign finance disclosure laws for all regions

This trend is pronounced in more regions than others. While the “middle east and north Africa” region and “western Europe and north america” fall to a predicted probability of around 0.0 at a PSCI of 80, the “Latin america and the Caribbean” region only falls to 0.1 at that same value. It’s less pronounced curve suggests the decrease in predicted probability of these laws is more gradual as PSC increases, whereas in the other regions, higher corruption has a way stronger association with the absence of these laws.

The interaction effect between public sector corruption and regions has interesting implications on how corruption is visible across different regions. The impact of corruption on a region having a lower presence of laws disclosing campaign finances is way more pronounced in some regions than others. Our visualization might imply that, when the “Latin america and the Caribbean” region has lower campaign finance disclosure laws, other factors might better explain this decrease than public sector corruption, which is more effective for explaining the same absence in the other regions.

Thank you for reading my code

Best regards: Mateja Dokic