“Task 1 Using the Civil dataset, perform a simple linear regression with public_sector_corruption as the dependent variable and polyarchy as the independent variable. Visualize the relationship with a scatter plot and overlay the regression line. Use the sjPlot package to create regression tables and interpret the results.”

load("C:/Users/tophe/Downloads/Civil.Rdata")
Civil = "C:/Users/tophe/Downloads/Civil.Rdata"
model1 = lm(public_sector_corruption ~ polyarchy, data=corruption)
ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
  geom_point() +
  stat_smooth(method = "lm", formula = y ~ x, linewidth = 1, color = "green4") +
  scale_color_manual(values = c("blue4"), guide = "none") +
  labs(x = "Polyarchy Index", y = "Public Sector Corruption Index")

tab_model(model1)
  public_sector_corruption
Predictors Estimates CI p
(Intercept) 89.44 81.64 – 97.25 <0.001
polyarchy -0.83 -0.96 – -0.69 <0.001
Observations 168
R2 / R2 adjusted 0.472 / 0.469

“According to the model’s y intercept of 89.44, a country with a zero polyarchy index score would be expected to have a public sector corruption index score of 89.44.For ever extra polyarchy index score gained, the public sector corruption score would be 0.83 lower. The r squared of the equation is 0.472 which means 47 percent of variation in public sector corruption can be explained by the model. All p values are below 0.05 so the data is statistically significant.”

“Task 2 Extend the model from Task 1 by adding a quadratic term for polyarchy to capture potential non-linear relationships. Visualize the polynomial relationship using ggplot2. Calculate the marginal effects of polyarchy at different levels (30, 60, 90) using both manual calculations and the marginaleffects package. Interpret the results.”

model2 = lm(public_sector_corruption ~ polyarchy + I(polyarchy^2), data = corruption)
ggplot(corruption, aes(x = polyarchy, y = public_sector_corruption)) +
  geom_point() +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 1, color = "green4") +
  scale_color_manual(values = c("blue4"), guide = "none") +
  labs(x = "Polyarchy Index", y = "Public Sector Corruption Index")

polyarchy2 = coef(model2)["polyarchy"]
polyarchy3 = coef(model2)["I(polyarchy^2)"]
polyarchy_slope = function(x) polyarchy2 + (2 * polyarchy3 * x) 
polyarchy_slope(c(30, 60, 90))
## [1] -0.06392759 -1.10250800 -2.14108840
model2 %>%
  slopes(newdata = datagrid(polyarchy = c(30, 60, 90)), eps = 0.001)
## 
##       Term polyarchy Estimate Std. Error       z Pr(>|z|)     S 2.5 % 97.5 %
##  polyarchy        30  -0.0639      0.141  -0.454     0.65   0.6 -0.34  0.212
##  polyarchy        60  -1.1025      0.077 -14.325   <0.001 152.2 -1.25 -0.952
##  polyarchy        90  -2.1411      0.227  -9.426   <0.001  67.7 -2.59 -1.696
## 
## Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, polyarchy, predicted_lo, predicted_hi, predicted, public_sector_corruption 
## Type:  response

“At polyarchy = 30, the rate at which the public sector corruption index decreased per extra point to polyarchy is 0.06. The coefficient is -1.1 at polyarchy = 60 and -2.14 at polyarchy = 90. These are the amounts the public sector corruption index changes with extra polyarchy.”

“Task 3 Using the Civil dataset, fit a logistic regression model predicting the presence of campaign finance disclosure laws (disclose_donations) with public_sector_corruption and log_gdp_percapita as predictors. Use the sjPlot package to create regression tables and interpret the results.”

model3=glm(disclose_donations~log_gdp_percapita+public_sector_corruption,data=corruption,family=binomial(link="logit"))
tab_model(model3)
  disclose_donations
Predictors Odds Ratios CI p
(Intercept) 0.60 0.01 – 46.85 0.818
GDP per capita (constant
2015 US$)
1.28 0.84 – 1.98 0.253
public_sector_corruption 0.94 0.92 – 0.96 <0.001
Observations 168
R2 Tjur 0.454

“According to the logistic regression output, for each one unit increase in public sector corruption, a country is 6 percent less likely to disclose donations. A country without any public sector corruption score or gdp per capita would be expected to be 60 percent likely to disclose donations. If a country has one unit more gdp per capita, they are 28 percent more likely to disclose donations according to the odds ratio, but this does not matter as only public sector corruption had a low enough p score to be statistically significant.”

“Task 4 Calculate the marginal effects of public_sector_corruption from the logistic regression model in Task 3 at representative values (20, 50, 80). Use the marginaleffects and emmeans packages to compute these effects. Visualize the predicted probabilities of having campaign finance disclosure laws across a range of public_sector_corruption values using ggplot2.”

model3 %>%
  slopes(newdata = datagrid(public_sector_corruption = c(20, 50, 80)), eps = 0.001)
## 
##                      Term public_sector_corruption Estimate Std. Error      z
##  log_gdp_percapita                              20  0.05939   0.054024  1.099
##  log_gdp_percapita                              50  0.04066   0.037107  1.096
##  log_gdp_percapita                              80  0.00989   0.011757  0.841
##  public_sector_corruption                       20 -0.01422   0.002330 -6.103
##  public_sector_corruption                       50 -0.00973   0.001646 -5.916
##  public_sector_corruption                       80 -0.00237   0.000819 -2.891
##  Pr(>|z|)    S    2.5 %    97.5 %
##   0.27159  1.9 -0.04649  0.165278
##   0.27322  1.9 -0.03207  0.113384
##   0.40031  1.3 -0.01316  0.032932
##   < 0.001 29.8 -0.01879 -0.009654
##   < 0.001 28.2 -0.01296 -0.006510
##   0.00384  8.0 -0.00397 -0.000762
## 
## Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, public_sector_corruption, predicted_lo, predicted_hi, predicted, log_gdp_percapita, disclose_donations 
## Type:  response
model3 %>%
  emtrends(~ public_sector_corruption, var = "public_sector_corruption", at = list(public_sector_corruption = c(20, 50, 80)), delta.var = 0.001) %>%
  test()
##  public_sector_corruption public_sector_corruption.trend     SE  df z.ratio
##                        20                        -0.0596 0.0119 Inf  -5.007
##                        50                        -0.0596 0.0119 Inf  -5.007
##                        80                        -0.0596 0.0119 Inf  -5.007
##  p.value
##   <.0001
##   <.0001
##   <.0001
ggplot(corruption, 
       aes(x = public_sector_corruption, y = as.numeric(disclose_donations))) +
  geom_point(aes()) +
  geom_smooth(method = "glm", method.args = list(family = binomial(link = "logit"))) +
  geom_label(data = corruption, 
             aes(label = country_name), nudge_y = 0.06, hjust = 1) +
  geom_label(data = corruption, 
             aes(label = country_name), nudge_y = -0.06, hjust = 0) +
  scale_color_manual(values = c("green4"), guide = "none") +
  labs(x = "Public Sector Corruption Index", 
       y = "Does the Country have Campaign Finance Diclosure Laws")
## `geom_smooth()` using formula = 'y ~ x'

“Task 5 Explore the interaction effect between public_sector_corruption and region in the logistic regression model from Task 3. Use the datagrid() function from the marginaleffects package to create a dataset with representative values for regions. Fit the logistic regression model with the interaction term and visualize the interaction effects using ggplot2. Interpret the results and discuss the implications of the interaction effect.”

regions1 = c("Western Europe and North America", 
                    "Latin America and the Caribbean",
                    "Middle East and North Africa",
                    "Asia and Pacific",
                    "Sub-Saharan Africa",
                    "Eastern Europe and Central Asia")

model4 = glm(
  disclose_donations ~ public_sector_corruption + I(public_sector_corruption^2) + 
  log_gdp_percapita + public_sector_corruption * region,
  family = binomial(link = "logit"),
  data = corruption)

datagrid1 = datagrid(model = model4,
                         public_sector_corruption = c(20, 80),
                         region = regions1)
print(datagrid1)
##    log_gdp_percapita public_sector_corruption                           region
## 1           8.567353                       20 Western Europe and North America
## 2           8.567353                       20  Latin America and the Caribbean
## 3           8.567353                       20     Middle East and North Africa
## 4           8.567353                       20                 Asia and Pacific
## 5           8.567353                       20               Sub-Saharan Africa
## 6           8.567353                       20  Eastern Europe and Central Asia
## 7           8.567353                       80 Western Europe and North America
## 8           8.567353                       80  Latin America and the Caribbean
## 9           8.567353                       80     Middle East and North Africa
## 10          8.567353                       80                 Asia and Pacific
## 11          8.567353                       80               Sub-Saharan Africa
## 12          8.567353                       80  Eastern Europe and Central Asia
##    rowid
## 1      1
## 2      2
## 3      3
## 4      4
## 5      5
## 6      6
## 7      7
## 8      8
## 9      9
## 10    10
## 11    11
## 12    12
prediction1 = model4 |> 
  emmeans(~ public_sector_corruption + region, var = "public_sector_corruption",
          at = list(public_sector_corruption = seq(0, 90, 1)),
          regrid = "response") |>
  as_tibble()

ggplot(prediction1, aes(x = public_sector_corruption, y = prob, color = region)) +
  geom_line(linewidth = 1) +
  labs(x = "Public sector corruption", y = "Predicted probability of having donation disclosure", color = NULL) +
  theme(legend.position = "right")

“The outcome suggests that regions strongly impacts outcome. When isolated into specific regions, the plots changed significantly even with the other variables such as public sector corruption held constant.”