Problem Set 6

Task 1

Using the Civil dataset, perform a simple linear regression with public_sector_corruption as the dependent variable and polyarchy as the independent variable. Visualize the relationship with a scatter plot and overlay the regression line. Use the sjPlot package to create regression tables and interpret the results.

ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
  geom_point() +  
  geom_smooth(method = "lm", col = "blue") +  
  labs(title = "Relationship between Polyarchy and Public Sector Corruption",
       x = "Polyarchy",
       y = "Public Sector Corruption")

model <- lm(public_sector_corruption ~ polyarchy, data = corruption)

tab_model(model, show.ci = TRUE, show.se = TRUE, show.p = TRUE)

	public sector corruption
Predictors	Estimates	std. Error	CI	p
(Intercept)	89.44	3.95	-Inf – Inf	<0.001
polyarchy	-0.83	0.07	-Inf – Inf	<0.001
Observations	168
R² / R² adjusted	0.472 / 0.469

Interpretation

The scatter plot shows that as polyarchy increases public sector corruption tends to decrease indicating a relationship between the two. The data points are spread out around the regression line showing some variation but following a trend. In terms of the regression table the p value is highly significant suggesting that the intercept differs significantly from zero. With an R value of 0.472 about 47.2% of the variation in public sector corruption can be explained by the model using polyarchy as a predictor highlighting its explanatory power. The adjusted R squared value of 0.469 accounts for the predictors in the model. Aligns closely, with the R squared value indicating that the model fits well.

Task 2

Extend the model from Task 1 by adding a quadratic term for polyarchy to capture potential non-linear relationships. Visualize the polynomial relationship using ggplot2. Calculate the marginal effects of polyarchy at different levels (30, 60, 90) using both manual calculations and the marginaleffects package. Interpret the results.

model_quad <- lm(public_sector_corruption ~ polyarchy + I(polyarchy^2), data = corruption)

ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
  geom_point() +  # Scatter plot
  stat_smooth(method = "lm", formula = y ~ poly(x, 2), col = "blue") +  # Polynomial regression line
  labs(title = "Polynomial Relationship between Polyarchy and Public Sector Corruption",
       x = "Polyarchy",
       y = "Public Sector Corruption")

coef_linear <- coef(model_quad)["polyarchy"]
coef_quadratic <- coef(model_quad)["I(polyarchy^2)"]

marginal_effect_30 <- coef_linear + 2 * coef_quadratic * 30
marginal_effect_60 <- coef_linear + 2 * coef_quadratic * 60
marginal_effect_90 <- coef_linear + 2 * coef_quadratic * 90

marginal_effect_30

##   polyarchy 
## -0.06392759

marginal_effect_60

## polyarchy 
## -1.102508

marginal_effect_90

## polyarchy 
## -2.141088

marginal_effects <- slopes(model_quad, newdata = data.frame(polyarchy = c(30, 60, 90)))

summary(marginal_effects)

##      rowid         term              estimate          std.error      
##  Min.   :1.0   Length:3           Min.   :-2.14109   Min.   :0.07682  
##  1st Qu.:1.5   Class :character   1st Qu.:-1.62180   1st Qu.:0.10880  
##  Median :2.0   Mode  :character   Median :-1.10251   Median :0.14078  
##  Mean   :2.0                      Mean   :-1.10251   Mean   :0.14815  
##  3rd Qu.:2.5                      3rd Qu.:-0.58322   3rd Qu.:0.18381  
##  Max.   :3.0                      Max.   :-0.06393   Max.   :0.22683  
##    statistic           p.value          s.value           conf.low      
##  Min.   :-14.3510   Min.   :0.0000   Min.   :  0.622   Min.   :-2.5857  
##  1st Qu.:-11.8950   1st Qu.:0.0000   1st Qu.: 34.236   1st Qu.:-1.9194  
##  Median : -9.4391   Median :0.0000   Median : 67.849   Median :-1.2531  
##  Mean   : -8.0814   Mean   :0.2166   Mean   : 73.736   Mean   :-1.3929  
##  3rd Qu.: -4.9466   3rd Qu.:0.3249   3rd Qu.:110.293   3rd Qu.:-0.7965  
##  Max.   : -0.4541   Max.   :0.6498   Max.   :152.738   Max.   :-0.3398  
##    conf.high        predicted_lo      predicted_hi       predicted      
##  Min.   :-1.6965   Min.   : 0.6361   Min.   : 0.6169   Min.   : 0.6265  
##  1st Qu.:-1.3242   1st Qu.:24.9607   1st Qu.:24.9462   1st Qu.:24.9535  
##  Median :-0.9519   Median :49.2854   Median :49.2755   Median :49.2805  
##  Mean   :-0.8121   Mean   :38.8996   Mean   :38.8897   Mean   :38.8946  
##  3rd Qu.:-0.3700   3rd Qu.:58.0313   3rd Qu.:58.0261   3rd Qu.:58.0287  
##  Max.   : 0.2120   Max.   :66.7773   Max.   :66.7767   Max.   :66.7770  
##    polyarchy  public_sector_corruption
##  Min.   :30   Min.   :48.8            
##  1st Qu.:45   1st Qu.:48.8            
##  Median :60   Median :48.8            
##  Mean   :60   Mean   :48.8            
##  3rd Qu.:75   3rd Qu.:48.8            
##  Max.   :90   Max.   :48.8

Interpretation

The analysis using regression reveals a complex link between polyarchy and public sector corruption. It suggests that as polyarchy rises public sector corruption also increases slightly until a point after which further increases in polyarchy lead to a decrease in corruption. On the hand, the marginal effects show that the influence of higher levels of polyarchy on reducing corruption is more pronounced compared to lower levels. This implies that efforts to enhance polyarchy might have an impact, on reducing public sector corruption as the level of polyarchy rises.

Task 3

Using the Civil dataset, fit a logistic regression model predicting the presence of campaign finance disclosure laws (disclose_donations) with public_sector_corruption and log_gdp_percapita as predictors. Use the sjPlot package to create regression tables and interpret the results.

model_logistic <- glm(disclose_donations ~ public_sector_corruption + log_gdp_percapita, 
                      data = corruption, 
                      family = binomial)

tab_model(model_logistic, show.ci = TRUE, show.se = TRUE, show.p = TRUE)

	disclose donations
Predictors	Odds Ratios	std. Error	CI	p
(Intercept)	0.60	1.32	0.00 – Inf	0.818
public sector corruption	0.94	0.01	0.00 – Inf	<0.001
GDP per capita(constant 2015 US$)	1.28	0.28	0.00 – Inf	0.253
Observations	168
R² Tjur	0.454

Interpretation

The odds ratio for the intercept at 0.60 does not show statistical significance (p = 0.818) suggesting it lacks predictive power without other factors. However, the odds ratio for public sector corruption being 0.94 is statistically significant (p < 0.001). This implies that with each increase in public sector corruption there is a 6% decrease in the likelihood of having campaign finance disclosure laws indicating a relationship between corruption and the presence of such laws. On the hand, the odds ratio for GDP per capita is 1.28 but lacks statistical significance (p = 0.253) indicating it does not significantly predict the presence of campaign finance disclosure laws, within this model. The R² Tjur tells us 45.4% of the variance in explaining the presence of campaign finance disclosure laws suggesting a moderate fit.

Task 4

Calculate the marginal effects of public_sector_corruption from the logistic regression model in Task 3 at representative values (20, 50, 80). Use the marginaleffects and emmeans packages to compute these effects. Visualize the predicted probabilities of having campaign finance disclosure laws across a range of public_sector_corruption values using ggplot2.

newdata_for_slopes <- data.frame(public_sector_corruption = c(20, 50, 80),
                                 log_gdp_percapita = mean(corruption$log_gdp_percapita))

marginal_effects_slopes <- slopes(model_logistic, newdata = newdata_for_slopes)
summary(marginal_effects_slopes)

##      rowid          term              estimate           std.error        
##  Min.   :1.00   Length:6           Min.   :-0.014221   Min.   :0.0008194  
##  1st Qu.:1.25   Class :character   1st Qu.:-0.007893   1st Qu.:0.0018283  
##  Median :2.00   Mode  :character   Median : 0.003760   Median :0.0070528  
##  Mean   :2.00                      Mean   : 0.013936   Mean   :0.0179471  
##  3rd Qu.:2.75                      3rd Qu.: 0.032965   3rd Qu.:0.0308017  
##  Max.   :3.00                      Max.   : 0.059394   Max.   :0.0539530  
##    statistic         p.value             s.value          conf.low        
##  Min.   :-6.059   Min.   :0.0000000   Min.   : 1.321   Min.   :-0.046352  
##  1st Qu.:-5.133   1st Qu.:0.0009644   1st Qu.: 1.873   1st Qu.:-0.028822  
##  Median :-1.024   Median :0.1374125   Median : 4.951   Median :-0.015989  
##  Mean   :-1.966   Mean   :0.1581606   Mean   :11.734   Mean   :-0.021240  
##  3rd Qu.: 1.031   3rd Qu.:0.2730758   3rd Qu.:22.905   3rd Qu.:-0.013024  
##  Max.   : 1.101   Max.   :0.4003598   Max.   :29.443   Max.   :-0.003974  
##    conf.high          predicted_lo      predicted_hi       predicted      
##  Min.   :-0.009621   Min.   :0.04141   Min.   :0.04141   Min.   :0.04142  
##  1st Qu.:-0.005058   1st Qu.:0.08243   1st Qu.:0.08241   1st Qu.:0.08242  
##  Median : 0.016086   Median :0.20546   Median :0.20542   Median :0.20544  
##  Mean   : 0.049112   Mean   :0.28477   Mean   :0.28474   Mean   :0.28476  
##  3rd Qu.: 0.093335   3rd Qu.:0.50692   3rd Qu.:0.50687   3rd Qu.:0.50692  
##  Max.   : 0.165139   Max.   :0.60749   Max.   :0.60743   Max.   :0.60742  
##  public_sector_corruption log_gdp_percapita disclose_donations
##  Min.   :20.0             Min.   :8.567     Mode:logical      
##  1st Qu.:27.5             1st Qu.:8.567     TRUE:6            
##  Median :50.0             Median :8.567                       
##  Mean   :50.0             Mean   :8.567                       
##  3rd Qu.:72.5             3rd Qu.:8.567                       
##  Max.   :80.0             Max.   :8.567

em_marginal_effects <- emmeans(model_logistic, ~ public_sector_corruption, at = list(public_sector_corruption = c(20, 50, 80)))

summary(em_marginal_effects)

##  public_sector_corruption emmean    SE  df asymp.LCL asymp.UCL
##                        20  0.436 0.300 Inf    -0.152     1.024
##                        50 -1.353 0.279 Inf    -1.899    -0.806
##                        80 -3.142 0.567 Inf    -4.252    -2.031
## 
## Results are given on the logit (not the response) scale. 
## Confidence level used: 0.95

public_sector_corruption_seq <- data.frame(public_sector_corruption = seq(0, 100, by = 1),
                                           log_gdp_percapita = mean(corruption$log_gdp_percapita))

predicted_probs <- predict(model_logistic, newdata = public_sector_corruption_seq, type = "response")

prediction_data <- cbind(public_sector_corruption_seq, predicted_probs)

library(ggplot2)
ggplot(prediction_data, aes(x = public_sector_corruption, y = predicted_probs)) +
  geom_line(color = "green") +
  labs(title = "Predicted Probabilities of Campaign Finance Disclosure Laws",
       x = "Public Sector Corruption",
       y = "Predicted Probability") +
  theme_minimal()

Task 5

Explore the interaction effect between public_sector_corruption and region in the logistic regression model from Task 3. Use the datagrid() function from the marginaleffects package to create a dataset with representative values for regions. Fit the logistic regression model with the interaction term and visualize the interaction effects using ggplot2. Interpret the results and discuss the implications of the interaction effect.

representative_data <- datagrid(model = model_logistic, 
                                region = unique(corruption$region),
                                public_sector_corruption = seq(0, 100, by = 10))

head(representative_data)

##   log_gdp_percapita                          region public_sector_corruption
## 1          8.567353 Latin America and the Caribbean                        0
## 2          8.567353 Latin America and the Caribbean                       10
## 3          8.567353 Latin America and the Caribbean                       20
## 4          8.567353 Latin America and the Caribbean                       30
## 5          8.567353 Latin America and the Caribbean                       40
## 6          8.567353 Latin America and the Caribbean                       50
##   rowid
## 1     1
## 2     2
## 3     3
## 4     4
## 5     5
## 6     6

model_interaction <- glm(disclose_donations ~ public_sector_corruption * region + log_gdp_percapita,
                         data = corruption, 
                         family = binomial)

tab_model(model_interaction, show.ci = TRUE, show.se = TRUE, show.p = TRUE)

	disclose donations
Predictors	Odds Ratios	std. Error	CI	p
(Intercept)	24.94	88.01	0.00 – Inf	0.362
public sector corruption	0.94	0.02	0.00 – Inf	0.006
region: Latin America and the Caribbean	0.08	0.13	0.00 – Inf	0.106
region: Middle East and North Africa	0.52	1.23	0.00 – Inf	0.782
region: Sub-Saharan Africa	0.20	0.36	0.00 – Inf	0.368
region: Western Europe and North America	0.35	0.55	0.00 – Inf	0.504
region: Asia and Pacific	0.40	0.74	0.00 – Inf	0.619
GDP per capita(constant 2015 US$)	1.02	0.35	0.00 – Inf	0.964
public_sector_corruption:regionLatin America and the Caribbean	1.03	0.03	0.00 – Inf	0.412
public_sector_corruption:regionMiddle East and North Africa	0.95	0.06	0.00 – Inf	0.419
public_sector_corruption:regionSub-Saharan Africa	0.98	0.04	0.00 – Inf	0.567
public_sector_corruption:regionWestern Europe and North America	0.96	0.08	0.00 – Inf	0.627
public_sector_corruption:regionAsia and Pacific	0.97	0.05	0.00 – Inf	0.498
Observations	168
R² Tjur	0.521

predicted_probs_interaction <- predict(model_interaction, newdata = representative_data, type = "response")

interaction_data <- cbind(representative_data, predicted_probs_interaction)

ggplot(interaction_data, aes(x = public_sector_corruption, y = predicted_probs_interaction, color = region)) +
  geom_line() +
  labs(title = "Interaction Effect of Public Sector Corruption and Region on Campaign Finance Disclosure Laws",
       x = "Public Sector Corruption",
       y = "Predicted Probability",
       color = "Region") +
  theme_minimal()

Interpretation

The odds ratio for the intercept does not show significance suggesting it lacks significant predictive power without considering other factors. On the hand, the odds ratio for public sector corruption is at 0.94 which is statistically significant (p = 0.006) indicating that higher corruption levels are linked to lower chances of having laws on campaign finance disclosure. The R² Tjur value reveals that 52.1% of the variance in the presence of campaign finance disclosure laws can be explained by the predictors used in the model indicating a moderate to high level of explanatory capability.

The interaction effects show that the relationship between public sector corruption and the presence of campaign finance disclosure laws varies by region where: - Eastern Europe and Central Asia; This region exhibits a predicted likelihood of having disclosure laws, at lower corruption levels but this likelihood decreases significantly as corruption levels rise. - Latin America and the Caribbean; Here there is a moderate probability of having disclosure laws which gradually decreases with increasing corruption. - Middle East and North Africa Sub Saharan Africa, Western Europe and North America, Asia and Pacific; These regions show similar trends where the likelihood of having disclosure laws decreases as corruption levels increase.