library(haven)
library(ggplot2)
library(stargazer)
library(MASS)
library(tidyverse)
library(skimr)
library(survey)
library(descr)
library(DHARMa)
library(jtools)
library(ggeffects)
library(nnet)
library(brant)
library(patchwork)
set.seed(500)
Partisan_strength - People with a higher partisan strength are less likely to lie about voting.
Education - People with a higher education level are less likely to lie about voting.
Political interest - People with more interest in politics are less likely to lie about voting.
Data cleaning process - Checking for missing data - complete there were many Na values inside the variables - Checking for any bad data in scale - complete all variables were in the right scale format with the lowest and highest values being the correct format - Checking for intuitive names - complete the names were already intuitive so they were left alone - Checking for scale flips - complete the scale appeared in a good format as the highest value for all of them were the positive numbers or larger number which looked good.
Applied the weighted scale and then separated the data out with my DV, IV, and control variables.
anes <- read_dta("anes_2020_vvoted_hw3.dta")
# data of people who did not vote
no_vote_dt <- anes %>%
filter(voted_20 == 0)
anes_comp <- no_vote_dt[complete.cases(no_vote_dt[, c("V200010b")]), ]
anes_weighted <- svydesign(ids = ~V200010c,
weights =~V200010b,
strata=~V200010d,
nest=TRUE, #Only True with clustered data like we have here
data = anes_comp) #PSU weight = 'ids', 'strata' = strata weight as specified in code book
# Just looking at the over all and we can see there is some missing data in most columns
# Looking like Na values however, other format seems fine.
# regression will omit the NA values so those are okay
# the variable names are already pretty intuitive so I am going to to leave the names alone exccept for the DV. Renaming this as vote response.
anes_data <- no_vote_dt %>%
dplyr::select(V202109x, educ, partisan_strength, polint, poc, age_group, ideo3, voted_20)
colnames(anes_data)[colnames(anes_data) == 'V202109x'] <- 'vote_response'
# Roughly even in terms of people who said they voted or didnt
freq(anes_data$vote_response)
## PRE-POST: SUMMARY: Voter turnout in 2020
## Frequency Percent Valid Percent
## 0 712 44.36 50.89
## 1 687 42.80 49.11
## NA's 206 12.83
## Total 1605 100.00 100.00
# looking like the lowest and highest values are in the correct format
# looking at representation of the numbers they appear to be on the right scale as well with no need to flip any values.
skim(anes_data)
Name | anes_data |
Number of rows | 1605 |
Number of columns | 8 |
_______________________ | |
Column type frequency: | |
numeric | 8 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
vote_response | 206 | 0.87 | 0.49 | 0.50 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▇ |
educ | 27 | 0.98 | 2.08 | 1.00 | 1 | 1 | 2 | 3 | 4 | ▇▇▁▃▂ |
partisan_strength | 11 | 0.99 | 1.69 | 1.13 | 0 | 1 | 2 | 3 | 3 | ▅▅▁▆▇ |
polint | 231 | 0.86 | 2.61 | 0.93 | 1 | 2 | 3 | 3 | 4 | ▃▅▁▇▃ |
poc | 24 | 0.99 | 0.38 | 0.48 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▅ |
age_group | 72 | 0.96 | 2.92 | 1.40 | 1 | 2 | 3 | 4 | 5 | ▇▇▅▇▆ |
ideo3 | 46 | 0.97 | 2.08 | 0.85 | 1 | 1 | 2 | 3 | 3 | ▆▁▅▁▇ |
voted_20 | 0 | 1.00 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | ▁▁▇▁▁ |
# here we see that more people in the data said they didnt vote
anes_data %>%
count(vote_response)
## # A tibble: 3 x 2
## vote_response n
## <dbl+lbl> <int>
## 1 0 [0. Did not vote] 712
## 2 1 [1. Voted] 687
## 3 NA 206
Going to run 3 models here one logit, one probit, and one weighted then I will compare the results and pick the best model to move forward with.
When looking at the comparison of the three models I notice that the logit model appears to be score the best when it comes to AIC, followed by the probit. However, the weighted logit actually performs the best here in terms of AIC and log likelihood. The coefficents remained pretty consistent between logit and probit with logit being about 2 times bigger consistently. There are three points where the significance goes away when comparing models and it is leaning partisan, liberal identification, and Age group. These three points show the impact of the weights on the significance of the Iv. Both move away from significance so I will use the weight logit model as I move forward. Very surprising to see age group get dispersed as a significant factor. Here we can start looking at some of those hypothesis questions and return some answers!
1 - Partisan_strength - People with a higher partisan strength are less likely to lie about voting. So here we notice that as partisan strength increases from weak to strong the coefficient grows large in the positive direction. Meaning people are less likely to lie about voting as partisan strength increases supporting the hypothesis. 2 - Education -People with a higher education level are less likely to lie about voting. Once more we notice as education increases so does the coefficient in the positive direction however, once we reach bachelors and advanced education it starts to level out. People with a bachelors and up are less likely to lie about voting compared to the highschool education. This supports the hypothesis above 3 Political interest - -People with more interest in politics are less likely to lie about voting. Political interest is a positive coefficient once more showing that as it increases people are less likely to lie about voting. This supports the hypothesis above.
For the models below in ideo I set the reference to 2 for moderate instead of it being liberal to make it easier to compare.
Final model to be used is the weighted logit model as it scores the best AIC and log likelihood.
logit <- glm(vote_response ~ as.factor(partisan_strength) + as.factor(educ) + polint + poc + relevel(as.factor(ideo3), ref = "2") + as.factor(age_group), data = anes_data, family = binomial(link = "logit"))
probit <- glm(vote_response ~ as.factor(partisan_strength) + as.factor(educ) + polint + poc + relevel(as.factor(ideo3), ref = "2") + as.factor(age_group), data = anes_data, family = binomial(link = "probit"))
logit_weight <- svyglm(V202109x ~ as.factor(partisan_strength) + as.factor(educ) + polint + poc + relevel(as.factor(ideo3), ref = "2") + as.factor(age_group), design = anes_weighted, family = binomial(link = "logit"))
stargazer(logit, probit, logit_weight, type = "text",
dep.var.labels = "Voting Response",
covariate.labels = c("Leaning Partisan", "Weak Partisan", "Strong Partisan", "Some College", "Bachelor", "Advanced school", "Political Interest", "People of Color", "Liberal Identification", "Conservative Identification", "Age(30-39)", "Age(40-49)", "Age(50-64)", "Age(65+)"), digits = 2)
##
## =============================================================
## Dependent variable:
## ---------------------------------
## Voting Response V202109x
## logistic probit survey-weighted
## logistic
## (1) (2) (3)
## -------------------------------------------------------------
## Leaning Partisan 0.46** 0.27** 0.33
## (0.20) (0.12) (0.27)
##
## Weak Partisan 0.43** 0.25** 0.71**
## (0.19) (0.11) (0.27)
##
## Strong Partisan 1.14*** 0.69*** 1.15***
## (0.19) (0.11) (0.25)
##
## Some College 0.48*** 0.28*** 0.46**
## (0.15) (0.09) (0.20)
##
## Bachelor 1.05*** 0.64*** 0.84***
## (0.19) (0.11) (0.25)
##
## Advanced school 1.03*** 0.63*** 0.87***
## (0.22) (0.13) (0.26)
##
## Political Interest 0.51*** 0.31*** 0.49***
## (0.07) (0.04) (0.11)
##
## People of Color -0.09 -0.05 -0.07
## (0.13) (0.08) (0.19)
##
## Liberal Identification 0.34** 0.21** 0.23
## (0.16) (0.10) (0.24)
##
## Conservative Identification -0.18 -0.11 -0.18
## (0.16) (0.09) (0.19)
##
## Age(30-39) -0.23 -0.14 -0.23
## (0.19) (0.11) (0.28)
##
## Age(40-49) -0.22 -0.13 -0.24
## (0.21) (0.13) (0.32)
##
## Age(50-64) 0.26 0.16 0.25
## (0.19) (0.11) (0.25)
##
## Age(65+) 0.69*** 0.42*** 0.44
## (0.22) (0.13) (0.27)
##
## Constant -2.54*** -1.53*** -2.66***
## (0.29) (0.17) (0.40)
##
## -------------------------------------------------------------
## Observations 1,267 1,267 1,267
## Log Likelihood -752.77 -752.90 -722.71
## Akaike Inf. Crit. 1,535.54 1,535.81 1,475.43
## =============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Looking at the graph below we can see as education levels rise so does the change of lying about voting until it reaches bachelor and advanced where it levels out.
# Education (Here is where the issue is at)
new_data_logit <- expand.grid(educ = seq(1,4,1), #Education Graph
partisan_strength = "3",
polint = mean(anes_data$polint, na.rm = TRUE),
poc = mean(anes_data$poc, na.rm = TRUE),
age_group = "5",
ideo3 = "2")
logit_n <-predict(logit, newdata = new_data_logit, type = "response", se.fit = TRUE) #Saves predicted probabilities
logit_n$ci_lower <- logit_n$fit - 1.96 * logit_n$se #Calculates lower bound
logit_n$ci_upper <- logit_n$fit + 1.96 * logit_n$se #Calculates upper bound
logit_n$x <- 1:4 #Sets the x variable
logit_n<-as.data.frame(logit_n) #Creates a data frame for graphing
ggplot(logit_n, aes(x = x, y = fit)) +
geom_line(color = "black", linewidth=.5) +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.5, fill =
"grey") + # CI ribbon
labs(x = "Education", y = "Predicted Probability", title = "Predicted Probability
with 95% CI") +
theme_minimal() + scale_x_continuous(breaks = 1:4, labels = paste0(c("Highschool", "Some College", "Bachelor", "Advanced School"))) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "red")
Looking at the graph below we see as partisan strength increases from 1 to 2 a major jump in probability however, between 2 and 3 there is not much and then once again a spike between 3 and 4. I believe this is due to the middle group not having much of a difference between 2 and 3 (leaning vs weak)
# partisan strength
new_data_logit_ps <- expand.grid(partisan_strength = c("0", "1", "2", "3"), #Partisan Strength Graph
educ = "2",
polint = mean(anes_data$polint, na.rm = TRUE),
poc = mean(anes_data$poc, na.rm = TRUE),
age_group = "5",
ideo3 = "2")
logit_n <-predict(logit, newdata = new_data_logit_ps, type = "response", se.fit = TRUE) #Saves predicted probabilities
logit_n$ci_lower <- logit_n$fit - 1.96 * logit_n$se #Calculates lower bound
logit_n$ci_upper <- logit_n$fit + 1.96 * logit_n$se #Calculates upper bound
logit_n$x <- 0:3 #Sets the x variable
logit_n<-as.data.frame(logit_n) #Creates a data frame for graphing
ggplot(logit_n, aes(x = x, y = fit)) +
geom_line(color = "black", linewidth=.5) +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.5, fill =
"grey") + # CI ribbon
labs(x = "Partisan Strength", y = "Predicted Probability", title = "Predicted Probability
with 95% CI") +
theme_minimal() + scale_x_continuous(breaks = 0:3, labels = paste0(c("Independent", "Leaning Partisan", "Weak Partisan", "Strong Partisan"))) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "red")
Political interests almost looks linear with how consistent it increases the probability of lying about voting. This is actually really interesting to see that the more interested a person is in it then the more likely they are to lie about their voting. I wonder what is going on here and if there is some underlying reason.
# Political interest
new_data_logit_pi <- expand.grid(polint = seq(1,4,1), #Political Interest Graph
educ = "2",
partisan_strength = ("3"),
poc = mean(anes_data$poc, na.rm = TRUE),
age_group = "5",
ideo3 = "2")
logit_n <-predict(logit, newdata = new_data_logit_pi, type = "response", se.fit = TRUE) #Saves predicted probabilities
logit_n$ci_lower <- logit_n$fit - 1.96 * logit_n$se #Calculates lower bound
logit_n$ci_upper <- logit_n$fit + 1.96 * logit_n$se #Calculates upper bound
logit_n$x <- 1:4 #Sets the x variable
logit_n<-as.data.frame(logit_n) #Creates a data frame for graphing
ggplot(logit_n, aes(x = x, y = fit)) +
geom_line(color = "black", linewidth=.5) +
geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.5, fill =
"grey") + # CI ribbon
labs(x = "Political Interest", y = "Predicted Probability", title = "Predicted Probability
with 95% CI") +
theme_minimal() + scale_x_continuous(breaks = 1:4, labels = paste0(c("Not interested", "Not very interested", "Somewat interested", "Very interested"))) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "red")
When looking at partisan strength we notice right away it is consistently significant across all models except for on the leaning partisan on the weighted model. Strong partisan increased probability of lying about voting by 1.15 log odds at its steepest slope (~28.75% in methods of 4). Other factors slipped behind at about half the size of impact. The reason for this would be the stronger you feel about a particular party the more likely you want to be involved. Political interest was one that threw me off a little as it increases so do the odds maybe the more it matters the more political you get with your responses. This increased probability of lying about voting by .49(~12.25% method of 4) log odds at steepest slope. Education had a big impact which followed a pattern that the more educated you were the more likely you are to lie about voting. The odds go from .46 in highschool to .87 log odds at the steepest slope in advanced school. The more educated a person the more they understand societal perception and thus, may lie regarding voting. another reason may be that it is connected with age groups and as we see in the plot below that age increases in likely hood of lying about voting as it increases.
The actual number of yes is 49.41% which means if I said 1 for all variables I would obtain that accuracy level. However, with the model that I made it improved the accuracy to 68.35% which is a solid improvement. This is an improvement of 18.94%! It was interesting to see it improve this much. However, there is still room for improvement but overall, I am satisfied with these numbers. This means we have increased the accuracy of guessing whether or not someone is lying about voting.
###Classification Accuracy
anes_data <- anes_data %>%
filter(complete.cases(.))
anes_data$predicted_prob<-predict(logit_weight, type="response")
anes_data$predicted_prob<-as.numeric(anes_data$predicted_prob)
#Create new vector with 0|1 predictions from the model
predicted_class <- ifelse(anes_data$predicted_prob >= 0.5, 1, 0)
# Compare the predicted binary outcomes to the actual y
actual_class <- anes_data$vote_response
# Calculate the classification accuracy
accuracy <- mean(predicted_class == actual_class)
print(accuracy)
## [1] 0.6835043
# Calculate the classification accuracy improvement
accuracy_improve <- accuracy-mean(anes_data$vote_response)
print(accuracy_improve)
## [1] 0.1894238
plot_summs(logit_weight)
Write a hypothesis for each of the three IVs you select
# pulling out whats needed for study
anes_gm <- data.frame(anes$V201416, anes$libcon, anes$V201435, anes$V201601, anes$educ, anes$age_group)
# renaming
gm_names <- c("gay_marriage", "libcon", "relig_ident", "sex_orient", "educ", "age_group")
colnames(anes_gm) <- gm_names
# changed scale to show more support for gay marriage with higher numbers.
anes_gm <- anes_gm %>%
mutate(gay_marriage = case_when(
gay_marriage == 1 ~ 3,
gay_marriage == 2 ~ 2,
gay_marriage == 3 ~ 1,
gay_marriage >3 ~ NA_real_,
gay_marriage <1 ~ NA_real_
))
# changed into to the char representation for the graphing
anes_gm <- anes_gm %>%
mutate(libcon = case_when(
libcon == 1 ~ "Extreme L",
libcon == 2 ~ "Somewhat L",
libcon == 3 ~ "Slightly L",
libcon == 4 ~ "Moderate",
libcon == 5 ~ "Slightly C",
libcon == 6 ~ "Somewhat C",
libcon == 7 ~ "Extreme C"
))
anes_gm <- anes_gm %>%
mutate(relig_ident = case_when(
relig_ident == 1 ~ "Monotheist",
relig_ident == 2 ~ "Monotheist",
relig_ident == 3 ~ "Monotheist",
relig_ident == 4 ~ "Monotheist",
relig_ident == 5 ~ "Monotheist",
relig_ident == 6 ~ "Monotheist",
relig_ident == 7 ~ "Polytheist",
relig_ident == 8 ~ "Polytheist",
relig_ident == 9 ~ "Non-Religous",
relig_ident == 10 ~ "Non-Religous",
relig_ident == 12 ~ "Non-Religous",
relig_ident == 11 ~ "Other",
))
# remove NA values
anes_gm <- anes_gm %>%
filter(complete.cases(.))
# looking here we notice that a majority of people support gay marriages (68.66%).
freq(anes_gm$gay_marriage)
## anes_gm$gay_marriage
## Frequency Percent
## 1 981 12.91
## 2 1400 18.43
## 3 5217 68.66
## Total 7598 100.00
# checking for any skewed or missing data
skim(anes_gm)
Name | anes_gm |
Number of rows | 7598 |
Number of columns | 6 |
_______________________ | |
Column type frequency: | |
character | 2 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
libcon | 0 | 1 | 8 | 10 | 0 | 7 | 0 |
relig_ident | 0 | 1 | 5 | 12 | 0 | 4 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
gay_marriage | 0 | 1 | 2.56 | 0.71 | 1 | 2 | 3 | 3 | 3 | ▂▁▂▁▇ |
sex_orient | 0 | 1 | 1.12 | 0.48 | 1 | 1 | 1 | 1 | 4 | ▇▁▁▁▁ |
educ | 0 | 1 | 2.43 | 1.02 | 1 | 2 | 2 | 3 | 4 | ▅▇▁▆▅ |
age_group | 0 | 1 | 3.39 | 1.38 | 1 | 2 | 4 | 5 | 5 | ▃▅▅▇▇ |
# changing vars to factor before model
anes_gm$gay_marriage <- factor(anes_gm$gay_marriage, ordered = TRUE)
# logistic model
gm_l <- polr(gay_marriage ~ libcon + relig_ident + sex_orient + educ + age_group, data = anes_gm, na.action = na.exclude, method = "logistic")
# probit model
gm_p <- polr(gay_marriage ~ libcon + relig_ident + sex_orient + educ + age_group, data = anes_gm, na.action = na.exclude, method = "probit")
anes_gm$educ_f <- factor(anes_gm$educ)
anes_gm$sex_orient_f <- factor(anes_gm$sex_orient)
anes_gm$age_f <- factor(anes_gm$age_group)
anes_gm$libcon_f <- relevel(factor(anes_gm$libcon), ref = "Moderate")
anes_gm$relig_ident_f <- relevel(factor(anes_gm$relig_ident), ref = "Non-Religous")
gm_p_f <- polr(gay_marriage ~ libcon_f + relig_ident_f + sex_orient + educ_f + age_f, data = anes_gm, na.action = na.exclude, method = "probit")
gm_l_f <- polr(gay_marriage ~ libcon_f + relig_ident_f + sex_orient + educ_f + age_f, data = anes_gm, na.action = na.exclude, method = "logistic")
stargazer(gm_l, gm_p, type = "text", digits = 3)
##
## ====================================================
## Dependent variable:
## ----------------------------
## gay_marriage
## ordered ordered
## logistic probit
## (1) (2)
## ----------------------------------------------------
## libconExtreme L 3.100*** 1.785***
## (0.230) (0.118)
##
## libconModerate 2.067*** 1.226***
## (0.111) (0.066)
##
## libconSlightly C 1.472*** 0.891***
## (0.110) (0.066)
##
## libconSlightly L 2.407*** 1.424***
## (0.121) (0.071)
##
## libconSomewhat C 0.803*** 0.492***
## (0.106) (0.065)
##
## libconSomewhat L 3.561*** 2.014***
## (0.162) (0.085)
##
## relig_identNon-Religous 0.809*** 0.442***
## (0.078) (0.043)
##
## relig_identOther -0.266*** -0.176***
## (0.069) (0.040)
##
## relig_identPolytheist 0.271 0.163
## (0.270) (0.150)
##
## sex_orient 0.121 0.055
## (0.075) (0.039)
##
## educ 0.195*** 0.130***
## (0.027) (0.016)
##
## age_group -0.171*** -0.094***
## (0.021) (0.012)
##
## ----------------------------------------------------
## Observations 7,598 7,598
## ====================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
stargazer(gm_l_f, gm_p_f, type = "text", digits = 3)
##
## ====================================================
## Dependent variable:
## ----------------------------
## gay_marriage
## ordered ordered
## logistic probit
## (1) (2)
## ----------------------------------------------------
## libcon_fExtreme C -2.074*** -1.230***
## (0.111) (0.066)
##
## libcon_fExtreme L 1.043*** 0.564***
## (0.216) (0.108)
##
## libcon_fSlightly C -0.595*** -0.335***
## (0.080) (0.046)
##
## libcon_fSlightly L 0.341*** 0.198***
## (0.094) (0.052)
##
## libcon_fSomewhat C -1.279*** -0.742***
## (0.078) (0.046)
##
## libcon_fSomewhat L 1.500*** 0.792***
## (0.142) (0.070)
##
## relig_ident_fMonotheist -0.811*** -0.444***
## (0.078) (0.043)
##
## relig_ident_fOther -1.073*** -0.616***
## (0.089) (0.049)
##
## relig_ident_fPolytheist -0.517* -0.267*
## (0.277) (0.153)
##
## sex_orient 0.123 0.056
## (0.075) (0.039)
##
## educ_f2 0.338*** 0.205***
## (0.070) (0.041)
##
## educ_f3 0.533*** 0.345***
## (0.079) (0.046)
##
## educ_f4 0.563*** 0.370***
## (0.088) (0.051)
##
## age_f2 -0.108 -0.037
## (0.115) (0.063)
##
## age_f3 -0.428*** -0.231***
## (0.114) (0.063)
##
## age_f4 -0.519*** -0.278***
## (0.103) (0.057)
##
## age_f5 -0.659*** -0.350***
## (0.103) (0.057)
##
## ----------------------------------------------------
## Observations 7,598 7,598
## ====================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
stargazer(gm_l, gm_p, type = "text",
dep.var.labels = "Gay Marriage",
covariate.labels = c("Extreme Lib Identification", "Moderate Identification", "Slightly conserv Identification", "Slightly Lib Identification", "Somewhat Conserv Identification", "Somewhat Lib Identification", "Non-Religous", "Other religous belief", "Polytheist", "Sexual Orientation", "Education", "Age Group"), digits = 2)
##
## ============================================================
## Dependent variable:
## ----------------------------
## Gay Marriage
## ordered ordered
## logistic probit
## (1) (2)
## ------------------------------------------------------------
## Extreme Lib Identification 3.10*** 1.79***
## (0.23) (0.12)
##
## Moderate Identification 2.07*** 1.23***
## (0.11) (0.07)
##
## Slightly conserv Identification 1.47*** 0.89***
## (0.11) (0.07)
##
## Slightly Lib Identification 2.41*** 1.42***
## (0.12) (0.07)
##
## Somewhat Conserv Identification 0.80*** 0.49***
## (0.11) (0.06)
##
## Somewhat Lib Identification 3.56*** 2.01***
## (0.16) (0.08)
##
## Non-Religous 0.81*** 0.44***
## (0.08) (0.04)
##
## Other religous belief -0.27*** -0.18***
## (0.07) (0.04)
##
## Polytheist 0.27 0.16
## (0.27) (0.15)
##
## Sexual Orientation 0.12 0.06
## (0.08) (0.04)
##
## Education 0.20*** 0.13***
## (0.03) (0.02)
##
## Age Group -0.17*** -0.09***
## (0.02) (0.01)
##
## ------------------------------------------------------------
## Observations 7,598 7,598
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
stargazer(gm_l_f, gm_p_f, type = "text",
dep.var.labels = "Gay Marriage (factors)",
covariate.labels = c("Extreme conserv Identification", "Extreme Lib Identification", "Slightly conserv Identification", "Slightly Lib Identification", "Somewhat Conserv Identification", "Somewhat Lib Identification", "Monotheist", "Other religous belief", "Polytheist", "Sexual Orientation", "Some college", "Bachelor", "Advanced School", "Age(30-39)", "Age(40-49)", "Age(50-64)", "Age(65+)"), digits = 2)
##
## ============================================================
## Dependent variable:
## ----------------------------
## Gay Marriage (factors)
## ordered ordered
## logistic probit
## (1) (2)
## ------------------------------------------------------------
## Extreme conserv Identification -2.07*** -1.23***
## (0.11) (0.07)
##
## Extreme Lib Identification 1.04*** 0.56***
## (0.22) (0.11)
##
## Slightly conserv Identification -0.59*** -0.33***
## (0.08) (0.05)
##
## Slightly Lib Identification 0.34*** 0.20***
## (0.09) (0.05)
##
## Somewhat Conserv Identification -1.28*** -0.74***
## (0.08) (0.05)
##
## Somewhat Lib Identification 1.50*** 0.79***
## (0.14) (0.07)
##
## Monotheist -0.81*** -0.44***
## (0.08) (0.04)
##
## Other religous belief -1.07*** -0.62***
## (0.09) (0.05)
##
## Polytheist -0.52* -0.27*
## (0.28) (0.15)
##
## Sexual Orientation 0.12 0.06
## (0.08) (0.04)
##
## Some college 0.34*** 0.21***
## (0.07) (0.04)
##
## Bachelor 0.53*** 0.35***
## (0.08) (0.05)
##
## Advanced School 0.56*** 0.37***
## (0.09) (0.05)
##
## Age(30-39) -0.11 -0.04
## (0.11) (0.06)
##
## Age(40-49) -0.43*** -0.23***
## (0.11) (0.06)
##
## Age(50-64) -0.52*** -0.28***
## (0.10) (0.06)
##
## Age(65+) -0.66*** -0.35***
## (0.10) (0.06)
##
## ------------------------------------------------------------
## Observations 7,598 7,598
## ============================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
# Brant test
brant(gm_l)
## ------------------------------------------------------------
## Test for X2 df probability
## ------------------------------------------------------------
## Omnibus 116.07 12 0
## libconExtreme L 9.74 1 0
## libconModerate 4.09 1 0.04
## libconSlightly C 4.42 1 0.04
## libconSlightly L 5.87 1 0.02
## libconSomewhat C 0.09 1 0.76
## libconSomewhat L 7.69 1 0.01
## relig_identNon-Religous 7.58 1 0.01
## relig_identOther 19.5 1 0
## relig_identPolytheist 0.75 1 0.39
## sex_orient 0.38 1 0.54
## educ 19.46 1 0
## age_group 9.54 1 0
## ------------------------------------------------------------
##
## H0: Parallel Regression Assumption holds
brant(gm_p)
## ------------------------------------------------------------
## Test for X2 df probability
## ------------------------------------------------------------
## Omnibus 116.07 12 0
## libconExtreme L 9.74 1 0
## libconModerate 4.09 1 0.04
## libconSlightly C 4.42 1 0.04
## libconSlightly L 5.87 1 0.02
## libconSomewhat C 0.09 1 0.76
## libconSomewhat L 7.69 1 0.01
## relig_identNon-Religous 7.58 1 0.01
## relig_identOther 19.5 1 0
## relig_identPolytheist 0.75 1 0.39
## sex_orient 0.38 1 0.54
## educ 19.46 1 0
## age_group 9.54 1 0
## ------------------------------------------------------------
##
## H0: Parallel Regression Assumption holds
brant(gm_l_f)
## ------------------------------------------------------------
## Test for X2 df probability
## ------------------------------------------------------------
## Omnibus 122.15 17 0
## libcon_fExtreme C 3.9 1 0.05
## libcon_fExtreme L 4.71 1 0.03
## libcon_fSlightly C 0 1 0.95
## libcon_fSlightly L 0.58 1 0.45
## libcon_fSomewhat C 5.76 1 0.02
## libcon_fSomewhat L 2.57 1 0.11
## relig_ident_fMonotheist 7.42 1 0.01
## relig_ident_fOther 0.5 1 0.48
## relig_ident_fPolytheist 1.96 1 0.16
## sex_orient 0.39 1 0.53
## educ_f2 1.07 1 0.3
## educ_f3 8.46 1 0
## educ_f4 12.4 1 0
## age_f2 4.42 1 0.04
## age_f3 0.16 1 0.69
## age_f4 4.41 1 0.04
## age_f5 10.89 1 0
## ------------------------------------------------------------
##
## H0: Parallel Regression Assumption holds
brant(gm_p_f)
## ------------------------------------------------------------
## Test for X2 df probability
## ------------------------------------------------------------
## Omnibus 122.15 17 0
## libcon_fExtreme C 3.9 1 0.05
## libcon_fExtreme L 4.71 1 0.03
## libcon_fSlightly C 0 1 0.95
## libcon_fSlightly L 0.58 1 0.45
## libcon_fSomewhat C 5.76 1 0.02
## libcon_fSomewhat L 2.57 1 0.11
## relig_ident_fMonotheist 7.42 1 0.01
## relig_ident_fOther 0.5 1 0.48
## relig_ident_fPolytheist 1.96 1 0.16
## sex_orient 0.39 1 0.53
## educ_f2 1.07 1 0.3
## educ_f3 8.46 1 0
## educ_f4 12.4 1 0
## age_f2 4.42 1 0.04
## age_f3 0.16 1 0.69
## age_f4 4.41 1 0.04
## age_f5 10.89 1 0
## ------------------------------------------------------------
##
## H0: Parallel Regression Assumption holds
glm_fit <- function(model) {
# Calculate Likelihood Ratio
lr <- logLik(model)
# Calculate AIC
aic <- AIC(model)
# Calculate BIC
n <- nobs(model)
p <- length(coef(model))
bic <- -2 * logLik(model) + p * log(n)
# Calculate Deviance
deviance <- summary(model)$deviance
# Return the metrics as a list
metrics <- data.frame(Likelihood_Ratio = lr, AIC = aic, BIC = bic, Deviance = deviance)
return(metrics)
}
glm_fit(gm_l)
## Likelihood_Ratio AIC BIC Deviance
## 1 -5373.028 10774.06 10853.28 10746.06
# slight improvement of AIC BIC and deviance on factor version
glm_fit(gm_l_f)
## Likelihood_Ratio AIC BIC Deviance
## 1 -5367.878 10773.76 10887.66 10735.76
Looking at extreme conservative vs extreme liberal. We notice that liberals are far more supportive of gay marriage in comparison to extreme conservatives.
logit<-ggpredict(gm_l_f, terms="libcon_f") #ggpredict(model_name, terms="variable to graph")
print(logit) #Overview of the predicted probability results
## # Predicted probabilities of gay_marriage
##
## # Response Level = 1
##
## libcon_f | Predicted | 95% CI
## -------------------------------------
## Moderate | 0.04 | [0.03, 0.04]
## Extreme C | 0.24 | [0.19, 0.29]
## Extreme L | 0.01 | [0.01, 0.02]
## Slightly C | 0.07 | [0.05, 0.08]
## Slightly L | 0.03 | [0.02, 0.03]
## Somewhat C | 0.12 | [0.10, 0.15]
## Somewhat L | 0.01 | [0.01, 0.01]
##
## # Response Level = 2
##
## libcon_f | Predicted | 95% CI
## -------------------------------------
## Moderate | 0.10 | [0.08, 0.11]
## Extreme C | 0.31 | [0.26, 0.38]
## Extreme L | 0.04 | [0.02, 0.06]
## Slightly C | 0.15 | [0.13, 0.19]
## Slightly L | 0.07 | [0.06, 0.09]
## Somewhat C | 0.23 | [0.20, 0.28]
## Somewhat L | 0.02 | [0.02, 0.03]
##
## # Response Level = 3
##
## libcon_f | Predicted | 95% CI
## -------------------------------------
## Moderate | 0.87 | [0.85, 0.88]
## Extreme C | 0.45 | [0.38, 0.52]
## Extreme L | 0.95 | [0.92, 0.97]
## Slightly C | 0.78 | [0.74, 0.82]
## Slightly L | 0.90 | [0.88, 0.92]
## Somewhat C | 0.64 | [0.59, 0.69]
## Somewhat L | 0.97 | [0.95, 0.98]
##
## Adjusted for:
## * relig_ident_f = Non-Religous
## * sex_orient = 1.12
## * educ_f = 1
## * age_f = 1
logit$x <-as.factor(logit$x)
# Define custom levels and labels
custom_levels <- c(1, 2, 3)
custom_labels <- c("Anti-gay relationship", "No gay marriage", "Pro gay marriage")
# Reassign variable with custom labels
logit$response.level <- factor(logit$response.level, levels = custom_levels, labels = custom_labels)
strong <- logit %>%
filter(logit$x %in% c("Extreme L", "Extreme C")) #Filters for the variables we need
ggplot(strong, aes(x = x, y = predicted, fill = response.level)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7)) +
theme_minimal(base_size = 13) +
labs(x = "Support for Gay Marriage", y = "Predicted Probability",
title = "Predicted Probability with Confidence Intervals") +
scale_fill_manual(values = c("Anti-gay relationship" = "darkgrey",
"No gay marriage" = "lightgrey",
"Pro gay marriage" = "antiquewhite")) +
labs(fill = "Ideological Identification") +
theme(legend.position = c(0.05, 0.95), legend.justification = c(0, 1))
Comparing sexual orientation support for gay marriage even though it was not significant. They follow almost an identical line with each other where homosexuals are slightly higher.
logit2 <-ggpredict(gm_l_f, terms="sex_orient") #ggpredict(model_name, terms="variable to graph")
print(logit2) #Overview of the predicted probability results
## # Predicted probabilities of gay_marriage
##
## # Response Level = 1
##
## sex_orient | Predicted | 95% CI
## -------------------------------------
## 1 | 0.04 | [0.03, 0.04]
## 2 | 0.03 | [0.03, 0.05]
## 3 | 0.03 | [0.02, 0.05]
## 4 | 0.03 | [0.02, 0.05]
##
## # Response Level = 2
##
## sex_orient | Predicted | 95% CI
## -------------------------------------
## 1 | 0.10 | [0.09, 0.11]
## 2 | 0.09 | [0.07, 0.12]
## 3 | 0.08 | [0.05, 0.12]
## 4 | 0.07 | [0.04, 0.12]
##
## # Response Level = 3
##
## sex_orient | Predicted | 95% CI
## -------------------------------------
## 1 | 0.86 | [0.85, 0.88]
## 2 | 0.88 | [0.84, 0.91]
## 3 | 0.89 | [0.84, 0.93]
## 4 | 0.90 | [0.84, 0.94]
##
## Adjusted for:
## * libcon_f = Moderate
## * relig_ident_f = Non-Religous
## * educ_f = 1
## * age_f = 1
logit2$x <-as.factor(logit2$x)
# Define custom levels and labels
custom_levels <- c(1, 2, 3)
custom_labels <- c("Anti-gay relationship", "No gay marriage", "Pro gay marriage")
# Reassign variable with custom labels
logit2$response.level <- factor(logit2$response.level, levels = custom_levels, labels = custom_labels)
custom_levels <- c(1, 2)
custom_labels <- c("Heterosexual", "Homosexual")
logit2$x <- factor(logit2$x, levels = custom_levels, labels = custom_labels)
orient <- logit2 %>%
filter(logit2$x %in% c("Heterosexual", "Homosexual"))
ggplot(orient, aes(x = x, y = predicted, fill = response.level)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7)) +
theme_minimal(base_size = 13) +
labs(x = "Support for Gay Marriage", y = "Predicted Probability",
title = "Predicted Probability with Confidence Intervals") +
scale_fill_manual(values = c("Anti-gay relationship" = "darkgrey",
"No gay marriage" = "lightgrey",
"Pro gay marriage" = "antiquewhite")) +
labs(fill = "Sexual Orientation") +
theme(legend.position = c(0.05, 0.95), legend.justification = c(0, 1))
Comparing the different religious identifications support levels for gay marriage. Here we can visually see the non-religious group was the most supportive of gay marriage which falls into my hypothesis.
logit3 <-ggpredict(gm_l_f, terms="relig_ident_f") #ggpredict(model_name, terms="variable to graph")
print(logit3) #Overview of the predicted probability results
## # Predicted probabilities of gay_marriage
##
## # Response Level = 1
##
## relig_ident_f | Predicted | 95% CI
## ----------------------------------------
## Non-Religous | 0.04 | [0.03, 0.04]
## Monotheist | 0.08 | [0.07, 0.10]
## Other | 0.10 | [0.08, 0.13]
## Polytheist | 0.06 | [0.04, 0.10]
##
## # Response Level = 2
##
## relig_ident_f | Predicted | 95% CI
## ----------------------------------------
## Non-Religous | 0.10 | [0.08, 0.11]
## Monotheist | 0.18 | [0.15, 0.21]
## Other | 0.21 | [0.17, 0.25]
## Polytheist | 0.14 | [0.09, 0.23]
##
## # Response Level = 3
##
## relig_ident_f | Predicted | 95% CI
## ----------------------------------------
## Non-Religous | 0.87 | [0.85, 0.88]
## Monotheist | 0.74 | [0.69, 0.78]
## Other | 0.69 | [0.63, 0.74]
## Polytheist | 0.79 | [0.69, 0.87]
##
## Adjusted for:
## * libcon_f = Moderate
## * sex_orient = 1.12
## * educ_f = 1
## * age_f = 1
logit3$x <-as.factor(logit3$x)
# Define custom levels and labels
custom_levels <- c(1, 2, 3)
custom_labels <- c("Anti-gay relationship", "No gay marriage", "Support gay marriage")
# Reassign variable with custom labels
logit3$response.level <- factor(logit3$response.level, levels = custom_levels, labels = custom_labels)
custom_order <- c('Monotheist', 'Polytheist', 'Non-Religous', 'Other')
# Ordered model
plot1 <- ggplot(logit3, aes(x = x, y = predicted, color = response.level, group = response.level)) +
geom_point() + theme_minimal(base_size = 10) +
labs(x = "Support for Gay Marriage", y = "Predicted Probability",
title = "Predicted Probability with Confidence Intervals") +
labs(color = "Gay Marriage Opinion") +
geom_errorbar(aes(ymin=conf.low, ymax=conf.high),
linewidth=.3, # Thinner lines
width=.2) + scale_x_discrete(limits = custom_order)
# Multinomial model
anes_gm$gay_marriage_f<-as.character(anes_gm$gay_marriage)
model <- multinom(gay_marriage_f ~ libcon_f + relig_ident_f + sex_orient + educ_f + age_f, data=anes_gm)
## # weights: 57 (36 variable)
## initial value 8347.256169
## iter 10 value 6125.050742
## iter 20 value 5490.568433
## iter 30 value 5326.087952
## iter 40 value 5308.712744
## final value 5308.582327
## converged
stargazer(model, type="text")
##
## ====================================================
## Dependent variable:
## ----------------------------
## 2 3
## (1) (2)
## ----------------------------------------------------
## libcon_fExtreme C -0.870*** -2.745***
## (0.156) (0.164)
##
## libcon_fExtreme L -0.654 0.687**
## (0.410) (0.303)
##
## libcon_fSlightly C -0.204 -0.708***
## (0.137) (0.118)
##
## libcon_fSlightly L -0.056 0.313**
## (0.173) (0.146)
##
## libcon_fSomewhat C -0.276** -1.534***
## (0.128) (0.115)
##
## libcon_fSomewhat L -0.227 1.372***
## (0.279) (0.227)
##
## relig_ident_fMonotheist 0.112 -0.765***
## (0.138) (0.117)
##
## relig_ident_fOther -0.435*** -1.261***
## (0.152) (0.127)
##
## relig_ident_fPolytheist 0.759 0.042
## (0.580) (0.541)
##
## sex_orient -0.060 0.084
## (0.129) (0.104)
##
## educ_f2 0.274** 0.467***
## (0.107) (0.095)
##
## educ_f3 0.611*** 0.833***
## (0.125) (0.114)
##
## educ_f4 0.778*** 0.969***
## (0.145) (0.133)
##
## age_f2 0.383* 0.104
## (0.196) (0.166)
##
## age_f3 -0.057 -0.447***
## (0.190) (0.158)
##
## age_f4 0.147 -0.477***
## (0.171) (0.144)
##
## age_f5 0.238 -0.612***
## (0.171) (0.145)
##
## Constant 0.210 2.639***
## (0.263) (0.217)
##
## ----------------------------------------------------
## Akaike Inf. Crit. 10,689.170 10,689.170
## ====================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
model_m<-ggpredict(model, terms="relig_ident_f")
plot2 <- ggplot(model_m, aes(x = factor(x, level =c('Monotheist', 'Polytheist', 'Non-Religous', 'Other')), y = predicted, color = response.level, group = response.level)) +
geom_point() + theme_minimal(base_size = 15) +
labs(x = "Multi model", y = "Predicted Probability",
title = "Predicted Probability with Confidence Intervals") +
labs(color = "Gay Marriage Opinion")
plot1
plot2
Is it violated overall? For individual variables? Should you move to a multinomial modeling approach?
When running it without factors we noticed the overall was violated and most of the individual variables were violated except for sexual orientation. However, when flipping the other variables to factor and running it again we notice that most of them do not violate the Brant test. The probit and logit models performed the exact same way so choosing a model did not matter much here it mattered more on whether to use factors or change model.
When looking at the side by side we notice that they are very similar so I would use the ordered model in comparison to the multinomial model. If the results were different significantly I would use the multinomial model.