Using the Civil dataset, perform a simple linear regression with public_sector_corruption as the dependent variable and polyarchy as the independent variable. Visualize the relationship with a scatter plot and overlay the regression line. Use the sjPlot package to create regression tables and interpret the results.
ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
geom_point() +
geom_smooth(method = "lm", col = "blue") +
labs(title = "Relationship between Polyarchy and Public Sector Corruption",
x = "Polyarchy",
y = "Public Sector Corruption")
model <- lm(public_sector_corruption ~ polyarchy, data = corruption)
tab_model(model, show.ci = TRUE, show.se = TRUE, show.p = TRUE)
| public sector corruption | ||||
|---|---|---|---|---|
| Predictors | Estimates | std. Error | CI | p |
| (Intercept) | 89.44 | 3.95 | -Inf – Inf | <0.001 |
| polyarchy | -0.83 | 0.07 | -Inf – Inf | <0.001 |
| Observations | 168 | |||
| R2 / R2 adjusted | 0.472 / 0.469 | |||
The scatter plot shows that as polyarchy increases public sector corruption tends to decrease indicating a relationship between the two. The data points are spread out around the regression line showing some variation but following a trend. In terms of the regression table the p value is highly significant suggesting that the intercept differs significantly from zero. With an R value of 0.472 about 47.2% of the variation in public sector corruption can be explained by the model using polyarchy as a predictor highlighting its explanatory power. The adjusted R squared value of 0.469 accounts for the predictors in the model. Aligns closely, with the R squared value indicating that the model fits well.
Extend the model from Task 1 by adding a quadratic term for polyarchy to capture potential non-linear relationships. Visualize the polynomial relationship using ggplot2. Calculate the marginal effects of polyarchy at different levels (30, 60, 90) using both manual calculations and the marginaleffects package. Interpret the results.
model_quad <- lm(public_sector_corruption ~ polyarchy + I(polyarchy^2), data = corruption)
ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
geom_point() + # Scatter plot
stat_smooth(method = "lm", formula = y ~ poly(x, 2), col = "blue") + # Polynomial regression line
labs(title = "Polynomial Relationship between Polyarchy and Public Sector Corruption",
x = "Polyarchy",
y = "Public Sector Corruption")
coef_linear <- coef(model_quad)["polyarchy"]
coef_quadratic <- coef(model_quad)["I(polyarchy^2)"]
marginal_effect_30 <- coef_linear + 2 * coef_quadratic * 30
marginal_effect_60 <- coef_linear + 2 * coef_quadratic * 60
marginal_effect_90 <- coef_linear + 2 * coef_quadratic * 90
marginal_effect_30
## polyarchy
## -0.06392759
marginal_effect_60
## polyarchy
## -1.102508
marginal_effect_90
## polyarchy
## -2.141088
marginal_effects <- slopes(model_quad, newdata = data.frame(polyarchy = c(30, 60, 90)))
summary(marginal_effects)
## rowid term estimate std.error
## Min. :1.0 Length:3 Min. :-2.14109 Min. :0.07682
## 1st Qu.:1.5 Class :character 1st Qu.:-1.62180 1st Qu.:0.10880
## Median :2.0 Mode :character Median :-1.10251 Median :0.14078
## Mean :2.0 Mean :-1.10251 Mean :0.14815
## 3rd Qu.:2.5 3rd Qu.:-0.58322 3rd Qu.:0.18381
## Max. :3.0 Max. :-0.06393 Max. :0.22683
## statistic p.value s.value conf.low
## Min. :-14.3510 Min. :0.0000 Min. : 0.622 Min. :-2.5857
## 1st Qu.:-11.8950 1st Qu.:0.0000 1st Qu.: 34.236 1st Qu.:-1.9194
## Median : -9.4391 Median :0.0000 Median : 67.849 Median :-1.2531
## Mean : -8.0814 Mean :0.2166 Mean : 73.736 Mean :-1.3929
## 3rd Qu.: -4.9466 3rd Qu.:0.3249 3rd Qu.:110.293 3rd Qu.:-0.7965
## Max. : -0.4541 Max. :0.6498 Max. :152.738 Max. :-0.3398
## conf.high predicted_lo predicted_hi predicted
## Min. :-1.6965 Min. : 0.6361 Min. : 0.6169 Min. : 0.6265
## 1st Qu.:-1.3242 1st Qu.:24.9607 1st Qu.:24.9462 1st Qu.:24.9535
## Median :-0.9519 Median :49.2854 Median :49.2755 Median :49.2805
## Mean :-0.8121 Mean :38.8996 Mean :38.8897 Mean :38.8946
## 3rd Qu.:-0.3700 3rd Qu.:58.0313 3rd Qu.:58.0261 3rd Qu.:58.0287
## Max. : 0.2120 Max. :66.7773 Max. :66.7767 Max. :66.7770
## polyarchy public_sector_corruption
## Min. :30 Min. :48.8
## 1st Qu.:45 1st Qu.:48.8
## Median :60 Median :48.8
## Mean :60 Mean :48.8
## 3rd Qu.:75 3rd Qu.:48.8
## Max. :90 Max. :48.8
The analysis using regression reveals a complex link between polyarchy and public sector corruption. It suggests that as polyarchy rises public sector corruption also increases slightly until a point after which further increases in polyarchy lead to a decrease in corruption. On the hand, the marginal effects show that the influence of higher levels of polyarchy on reducing corruption is more pronounced compared to lower levels. This implies that efforts to enhance polyarchy might have an impact, on reducing public sector corruption as the level of polyarchy rises.
Using the Civil dataset, fit a logistic regression model predicting the presence of campaign finance disclosure laws (disclose_donations) with public_sector_corruption and log_gdp_percapita as predictors. Use the sjPlot package to create regression tables and interpret the results.
model_logistic <- glm(disclose_donations ~ public_sector_corruption + log_gdp_percapita,
data = corruption,
family = binomial)
tab_model(model_logistic, show.ci = TRUE, show.se = TRUE, show.p = TRUE)
| disclose donations | ||||
|---|---|---|---|---|
| Predictors | Odds Ratios | std. Error | CI | p |
| (Intercept) | 0.60 | 1.32 | 0.00 – Inf | 0.818 |
| public sector corruption | 0.94 | 0.01 | 0.00 – Inf | <0.001 |
|
GDP per capita(constant 2015 US$) |
1.28 | 0.28 | 0.00 – Inf | 0.253 |
| Observations | 168 | |||
| R2 Tjur | 0.454 | |||
The odds ratio for the intercept at 0.60 does not show statistical significance (p = 0.818) suggesting it lacks predictive power without other factors. However, the odds ratio for public sector corruption being 0.94 is statistically significant (p < 0.001). This implies that with each increase in public sector corruption there is a 6% decrease in the likelihood of having campaign finance disclosure laws indicating a relationship between corruption and the presence of such laws. On the hand, the odds ratio for GDP per capita is 1.28 but lacks statistical significance (p = 0.253) indicating it does not significantly predict the presence of campaign finance disclosure laws, within this model. The R² Tjur tells us 45.4% of the variance in explaining the presence of campaign finance disclosure laws suggesting a moderate fit.
Calculate the marginal effects of public_sector_corruption from the logistic regression model in Task 3 at representative values (20, 50, 80). Use the marginaleffects and emmeans packages to compute these effects. Visualize the predicted probabilities of having campaign finance disclosure laws across a range of public_sector_corruption values using ggplot2.
newdata_for_slopes <- data.frame(public_sector_corruption = c(20, 50, 80),
log_gdp_percapita = mean(corruption$log_gdp_percapita))
marginal_effects_slopes <- slopes(model_logistic, newdata = newdata_for_slopes)
summary(marginal_effects_slopes)
## rowid term estimate std.error
## Min. :1.00 Length:6 Min. :-0.014221 Min. :0.0008194
## 1st Qu.:1.25 Class :character 1st Qu.:-0.007893 1st Qu.:0.0018283
## Median :2.00 Mode :character Median : 0.003760 Median :0.0070528
## Mean :2.00 Mean : 0.013936 Mean :0.0179471
## 3rd Qu.:2.75 3rd Qu.: 0.032965 3rd Qu.:0.0308017
## Max. :3.00 Max. : 0.059394 Max. :0.0539530
## statistic p.value s.value conf.low
## Min. :-6.059 Min. :0.0000000 Min. : 1.321 Min. :-0.046352
## 1st Qu.:-5.133 1st Qu.:0.0009644 1st Qu.: 1.873 1st Qu.:-0.028822
## Median :-1.024 Median :0.1374125 Median : 4.951 Median :-0.015989
## Mean :-1.966 Mean :0.1581606 Mean :11.734 Mean :-0.021240
## 3rd Qu.: 1.031 3rd Qu.:0.2730758 3rd Qu.:22.905 3rd Qu.:-0.013024
## Max. : 1.101 Max. :0.4003598 Max. :29.443 Max. :-0.003974
## conf.high predicted_lo predicted_hi predicted
## Min. :-0.009621 Min. :0.04141 Min. :0.04141 Min. :0.04142
## 1st Qu.:-0.005058 1st Qu.:0.08243 1st Qu.:0.08241 1st Qu.:0.08242
## Median : 0.016086 Median :0.20546 Median :0.20542 Median :0.20544
## Mean : 0.049112 Mean :0.28477 Mean :0.28474 Mean :0.28476
## 3rd Qu.: 0.093335 3rd Qu.:0.50692 3rd Qu.:0.50687 3rd Qu.:0.50692
## Max. : 0.165139 Max. :0.60749 Max. :0.60743 Max. :0.60742
## public_sector_corruption log_gdp_percapita disclose_donations
## Min. :20.0 Min. :8.567 Mode:logical
## 1st Qu.:27.5 1st Qu.:8.567 TRUE:6
## Median :50.0 Median :8.567
## Mean :50.0 Mean :8.567
## 3rd Qu.:72.5 3rd Qu.:8.567
## Max. :80.0 Max. :8.567
em_marginal_effects <- emmeans(model_logistic, ~ public_sector_corruption, at = list(public_sector_corruption = c(20, 50, 80)))
summary(em_marginal_effects)
## public_sector_corruption emmean SE df asymp.LCL asymp.UCL
## 20 0.436 0.300 Inf -0.152 1.024
## 50 -1.353 0.279 Inf -1.899 -0.806
## 80 -3.142 0.567 Inf -4.252 -2.031
##
## Results are given on the logit (not the response) scale.
## Confidence level used: 0.95
public_sector_corruption_seq <- data.frame(public_sector_corruption = seq(0, 100, by = 1),
log_gdp_percapita = mean(corruption$log_gdp_percapita))
predicted_probs <- predict(model_logistic, newdata = public_sector_corruption_seq, type = "response")
prediction_data <- cbind(public_sector_corruption_seq, predicted_probs)
library(ggplot2)
ggplot(prediction_data, aes(x = public_sector_corruption, y = predicted_probs)) +
geom_line(color = "green") +
labs(title = "Predicted Probabilities of Campaign Finance Disclosure Laws",
x = "Public Sector Corruption",
y = "Predicted Probability") +
theme_minimal()
Explore the interaction effect between public_sector_corruption and region in the logistic regression model from Task 3. Use the datagrid() function from the marginaleffects package to create a dataset with representative values for regions. Fit the logistic regression model with the interaction term and visualize the interaction effects using ggplot2. Interpret the results and discuss the implications of the interaction effect.
representative_data <- datagrid(model = model_logistic,
region = unique(corruption$region),
public_sector_corruption = seq(0, 100, by = 10))
head(representative_data)
## log_gdp_percapita region public_sector_corruption
## 1 8.567353 Latin America and the Caribbean 0
## 2 8.567353 Latin America and the Caribbean 10
## 3 8.567353 Latin America and the Caribbean 20
## 4 8.567353 Latin America and the Caribbean 30
## 5 8.567353 Latin America and the Caribbean 40
## 6 8.567353 Latin America and the Caribbean 50
## rowid
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
model_interaction <- glm(disclose_donations ~ public_sector_corruption * region + log_gdp_percapita,
data = corruption,
family = binomial)
tab_model(model_interaction, show.ci = TRUE, show.se = TRUE, show.p = TRUE)
| disclose donations | ||||
|---|---|---|---|---|
| Predictors | Odds Ratios | std. Error | CI | p |
| (Intercept) | 24.94 | 88.01 | 0.00 – Inf | 0.362 |
| public sector corruption | 0.94 | 0.02 | 0.00 – Inf | 0.006 |
|
region: Latin America and the Caribbean |
0.08 | 0.13 | 0.00 – Inf | 0.106 |
|
region: Middle East and North Africa |
0.52 | 1.23 | 0.00 – Inf | 0.782 |
|
region: Sub-Saharan Africa |
0.20 | 0.36 | 0.00 – Inf | 0.368 |
|
region: Western Europe and North America |
0.35 | 0.55 | 0.00 – Inf | 0.504 |
| region: Asia and Pacific | 0.40 | 0.74 | 0.00 – Inf | 0.619 |
|
GDP per capita(constant 2015 US$) |
1.02 | 0.35 | 0.00 – Inf | 0.964 |
| public_sector_corruption:regionLatin America and the Caribbean | 1.03 | 0.03 | 0.00 – Inf | 0.412 |
| public_sector_corruption:regionMiddle East and North Africa | 0.95 | 0.06 | 0.00 – Inf | 0.419 |
| public_sector_corruption:regionSub-Saharan Africa | 0.98 | 0.04 | 0.00 – Inf | 0.567 |
| public_sector_corruption:regionWestern Europe and North America | 0.96 | 0.08 | 0.00 – Inf | 0.627 |
| public_sector_corruption:regionAsia and Pacific | 0.97 | 0.05 | 0.00 – Inf | 0.498 |
| Observations | 168 | |||
| R2 Tjur | 0.521 | |||
predicted_probs_interaction <- predict(model_interaction, newdata = representative_data, type = "response")
interaction_data <- cbind(representative_data, predicted_probs_interaction)
ggplot(interaction_data, aes(x = public_sector_corruption, y = predicted_probs_interaction, color = region)) +
geom_line() +
labs(title = "Interaction Effect of Public Sector Corruption and Region on Campaign Finance Disclosure Laws",
x = "Public Sector Corruption",
y = "Predicted Probability",
color = "Region") +
theme_minimal()
The odds ratio for the intercept does not show significance suggesting it lacks significant predictive power without considering other factors. On the hand, the odds ratio for public sector corruption is at 0.94 which is statistically significant (p = 0.006) indicating that higher corruption levels are linked to lower chances of having laws on campaign finance disclosure. The R² Tjur value reveals that 52.1% of the variance in the presence of campaign finance disclosure laws can be explained by the predictors used in the model indicating a moderate to high level of explanatory capability.
The interaction effects show that the relationship between public sector corruption and the presence of campaign finance disclosure laws varies by region where: - Eastern Europe and Central Asia; This region exhibits a predicted likelihood of having disclosure laws, at lower corruption levels but this likelihood decreases significantly as corruption levels rise. - Latin America and the Caribbean; Here there is a moderate probability of having disclosure laws which gradually decreases with increasing corruption. - Middle East and North Africa Sub Saharan Africa, Western Europe and North America, Asia and Pacific; These regions show similar trends where the likelihood of having disclosure laws decreases as corruption levels increase.