“Task 1 Using the Civil dataset, perform a simple linear regression with public_sector_corruption as the dependent variable and polyarchy as the independent variable. Visualize the relationship with a scatter plot and overlay the regression line. Use the sjPlot package to create regression tables and interpret the results.”
load("C:/Users/tophe/Downloads/Civil.Rdata")
Civil = "C:/Users/tophe/Downloads/Civil.Rdata"
model1 = lm(public_sector_corruption ~ polyarchy, data=corruption)
ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
geom_point() +
stat_smooth(method = "lm", formula = y ~ x, linewidth = 1, color = "green4") +
scale_color_manual(values = c("blue4"), guide = "none") +
labs(x = "Polyarchy Index", y = "Public Sector Corruption Index")
tab_model(model1)
| public_sector_corruption | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 89.44 | 81.64 – 97.25 | <0.001 |
| polyarchy | -0.83 | -0.96 – -0.69 | <0.001 |
| Observations | 168 | ||
| R2 / R2 adjusted | 0.472 / 0.469 | ||
“According to the model’s y intercept of 89.44, a country with a zero polyarchy index score would be expected to have a public sector corruption index score of 89.44.For ever extra polyarchy index score gained, the public sector corruption score would be 0.83 lower. The r squared of the equation is 0.472 which means 47 percent of variation in public sector corruption can be explained by the model. All p values are below 0.05 so the data is statistically significant.”
“Task 2 Extend the model from Task 1 by adding a quadratic term for polyarchy to capture potential non-linear relationships. Visualize the polynomial relationship using ggplot2. Calculate the marginal effects of polyarchy at different levels (30, 60, 90) using both manual calculations and the marginaleffects package. Interpret the results.”
model2 = lm(public_sector_corruption ~ polyarchy + I(polyarchy^2), data = corruption)
ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
geom_point() +
stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 1, color = "green4") +
scale_color_manual(values = c("blue4"), guide = "none") +
labs(x = "Polyarchy Index", y = "Public Sector Corruption Index")
polyarchy2 = coef(model2)["polyarchy"]
polyarchy3 = coef(model2)["I(polyarchy^2)"]
polyarchy_slope = function(x) polyarchy2 + (2 * polyarchy3 * x)
polyarchy_slope(c(30, 60, 90))
## [1] -0.06392759 -1.10250800 -2.14108840
model2 %>%
slopes(newdata = datagrid(polyarchy = c(30, 60, 90)), eps = 0.001)
##
## Term polyarchy Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## polyarchy 30 -0.0639 0.141 -0.454 0.65 0.6 -0.34 0.212
## polyarchy 60 -1.1025 0.077 -14.325 <0.001 152.2 -1.25 -0.952
## polyarchy 90 -2.1411 0.227 -9.426 <0.001 67.7 -2.59 -1.696
##
## Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, polyarchy, predicted_lo, predicted_hi, predicted, public_sector_corruption
## Type: response
“At polyarchy = 30, the rate at which the public sector corruption index decreased per extra point to polyarchy is 0.06. The coefficient is -1.1 at polyarchy = 60 and -2.14 at polyarchy = 90. These are the amounts the public sector corruption index changes with extra polyarchy.”
“Task 3 Using the Civil dataset, fit a logistic regression model predicting the presence of campaign finance disclosure laws (disclose_donations) with public_sector_corruption and log_gdp_percapita as predictors. Use the sjPlot package to create regression tables and interpret the results.”
model3=glm(disclose_donations~log_gdp_percapita+public_sector_corruption,data=corruption,family=binomial(link="logit"))
tab_model(model3)
| disclose_donations | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.60 | 0.01 – 46.85 | 0.818 |
|
GDP per capita (constant 2015 US$) |
1.28 | 0.84 – 1.98 | 0.253 |
| public_sector_corruption | 0.94 | 0.92 – 0.96 | <0.001 |
| Observations | 168 | ||
| R2 Tjur | 0.454 | ||
“According to the logistic regression output, for each one unit increase in public sector corruption, a country is 6 percent less likely to disclose donations. A country without any public sector corruption score or gdp per capita would be expected to be 60 percent likely to disclose donations. If a country has one unit more gdp per capita, they are 28 percent more likely to disclose donations according to the odds ratio, but this does not matter as only public sector corruption had a low enough p score to be statistically significant.”
“Task 4 Calculate the marginal effects of public_sector_corruption from the logistic regression model in Task 3 at representative values (20, 50, 80). Use the marginaleffects and emmeans packages to compute these effects. Visualize the predicted probabilities of having campaign finance disclosure laws across a range of public_sector_corruption values using ggplot2.”
model3 %>%
slopes(newdata = datagrid(public_sector_corruption = c(20, 50, 80)), eps = 0.001)
##
## Term public_sector_corruption Estimate Std. Error z
## log_gdp_percapita 20 0.05939 0.054024 1.099
## log_gdp_percapita 50 0.04066 0.037107 1.096
## log_gdp_percapita 80 0.00989 0.011757 0.841
## public_sector_corruption 20 -0.01422 0.002330 -6.103
## public_sector_corruption 50 -0.00973 0.001646 -5.916
## public_sector_corruption 80 -0.00237 0.000819 -2.891
## Pr(>|z|) S 2.5 % 97.5 %
## 0.27159 1.9 -0.04649 0.165278
## 0.27322 1.9 -0.03207 0.113384
## 0.40031 1.3 -0.01316 0.032932
## < 0.001 29.8 -0.01879 -0.009654
## < 0.001 28.2 -0.01296 -0.006510
## 0.00384 8.0 -0.00397 -0.000762
##
## Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, public_sector_corruption, predicted_lo, predicted_hi, predicted, log_gdp_percapita, disclose_donations
## Type: response
model3 %>%
emtrends(~ public_sector_corruption, var = "public_sector_corruption", at = list(public_sector_corruption = c(20, 50, 80)), delta.var = 0.001) %>%
test()
## public_sector_corruption public_sector_corruption.trend SE df z.ratio
## 20 -0.0596 0.0119 Inf -5.007
## 50 -0.0596 0.0119 Inf -5.007
## 80 -0.0596 0.0119 Inf -5.007
## p.value
## <.0001
## <.0001
## <.0001
ggplot(corruption,
aes(x = public_sector_corruption, y = as.numeric(disclose_donations))) +
geom_point(aes()) +
geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit"))) +
geom_label(data = corruption,
aes(label = country_name), nudge_y = 0.06, hjust = 1) +
geom_label(data = corruption,
aes(label = country_name), nudge_y = -0.06, hjust = 0) +
scale_color_manual(values = c("green4"), guide = "none") +
labs(x = "Public Sector Corruption Index",
y = "Does the Country have Campaign Finance Diclosure Laws")
## `geom_smooth()` using formula = 'y ~ x'
“Task 5 Explore the interaction effect between public_sector_corruption and region in the logistic regression model from Task 3. Use the datagrid() function from the marginaleffects package to create a dataset with representative values for regions. Fit the logistic regression model with the interaction term and visualize the interaction effects using ggplot2. Interpret the results and discuss the implications of the interaction effect.”
regions1 = c("Western Europe and North America",
"Latin America and the Caribbean",
"Middle East and North Africa",
"Asia and Pacific",
"Sub-Saharan Africa",
"Eastern Europe and Central Asia")
model4 = glm(
disclose_donations ~ public_sector_corruption + I(public_sector_corruption^2) +
log_gdp_percapita + public_sector_corruption * region,
family = binomial(link = "logit"),
data = corruption)
datagrid1 = datagrid(model = model4,
public_sector_corruption = c(20, 80),
region = regions1)
print(datagrid1)
## log_gdp_percapita public_sector_corruption region
## 1 8.567353 20 Western Europe and North America
## 2 8.567353 20 Latin America and the Caribbean
## 3 8.567353 20 Middle East and North Africa
## 4 8.567353 20 Asia and Pacific
## 5 8.567353 20 Sub-Saharan Africa
## 6 8.567353 20 Eastern Europe and Central Asia
## 7 8.567353 80 Western Europe and North America
## 8 8.567353 80 Latin America and the Caribbean
## 9 8.567353 80 Middle East and North Africa
## 10 8.567353 80 Asia and Pacific
## 11 8.567353 80 Sub-Saharan Africa
## 12 8.567353 80 Eastern Europe and Central Asia
## rowid
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## 7 7
## 8 8
## 9 9
## 10 10
## 11 11
## 12 12
prediction1 = model4 |>
emmeans(~ public_sector_corruption + region, var = "public_sector_corruption",
at = list(public_sector_corruption = seq(0, 90, 1)),
regrid = "response") |>
as_tibble()
ggplot(prediction1, aes(x = public_sector_corruption, y = prob, color = region)) +
geom_line(linewidth = 1) +
labs(x = "Public sector corruption", y = "Predicted probability of having donation disclosure", color = NULL) +
theme(legend.position = "right")
“The outcome suggests that regions strongly impacts outcome. When isolated into specific regions, the plots changed significantly even with the other variables such as public sector corruption held constant.”