library(haven)
library(ggplot2)
library(stargazer)
library(MASS)
library(tidyverse)
library(skimr)
library(survey)
library(descr)
library(DHARMa)
library(jtools)
library(ggeffects)
library(nnet)
library(brant)
library(patchwork)
set.seed(500)

Question 1

1. Write a hypothesis for all three primary independent variables

Partisan_strength - People with a higher partisan strength are less likely to lie about voting.

Education - People with a higher education level are less likely to lie about voting.

Political interest - People with more interest in politics are less likely to lie about voting.

2. Decide what control variables you will include with brief description of why it is included

Age group - I am including this since I believe as people get older they mature making them less likely to lie but I want to see if this will have an impact on whether or not they actually lie.
Ideological Identification - Certain cultures may make it hard for certain people to be honest about whether they voted or not. For example, people who voted for trump when asked may lie about who they voted for or say they did not vote.
Poc - I included this since I believe there is a different culture between races and wanted to see if that had an impact on if they would lie or not.

3. Make sure to do any necessary data cleaning or preprocessing before you begin the analysis

Data work

Data cleaning process - Checking for missing data - complete there were many Na values inside the variables - Checking for any bad data in scale - complete all variables were in the right scale format with the lowest and highest values being the correct format - Checking for intuitive names - complete the names were already intuitive so they were left alone - Checking for scale flips - complete the scale appeared in a good format as the highest value for all of them were the positive numbers or larger number which looked good.

Applied the weighted scale and then separated the data out with my DV, IV, and control variables.

anes <- read_dta("anes_2020_vvoted_hw3.dta")

# data of people who did not vote
no_vote_dt <- anes %>%
  filter(voted_20 == 0)

anes_comp <- no_vote_dt[complete.cases(no_vote_dt[, c("V200010b")]), ]
anes_weighted <- svydesign(ids = ~V200010c,
                           weights =~V200010b,
                           strata=~V200010d,
                           nest=TRUE, #Only True with clustered data like we have here
                           data = anes_comp) #PSU weight = 'ids', 'strata' = strata weight as specified in code book

# Just looking at the over all and we can see there is some missing data in most columns
# Looking like Na values however, other format seems fine.
# regression will omit the NA values so those are okay

# the variable names are already pretty intuitive so I am going to to leave the names alone exccept for the DV. Renaming this as vote response.
anes_data <- no_vote_dt %>%
  dplyr::select(V202109x, educ, partisan_strength, polint, poc, age_group, ideo3, voted_20)


colnames(anes_data)[colnames(anes_data) == 'V202109x'] <- 'vote_response'


# Roughly even in terms of people who said they voted or didnt
freq(anes_data$vote_response)

## PRE-POST: SUMMARY: Voter turnout in 2020 
##       Frequency Percent Valid Percent
## 0           712   44.36         50.89
## 1           687   42.80         49.11
## NA's        206   12.83              
## Total      1605  100.00        100.00

# looking like the lowest and highest values are in the correct format
# looking at representation of the numbers they appear to be on the right scale as well with no need to flip any values.
skim(anes_data)

Data summary
Name	anes_data
Number of rows	1605
Number of columns	8
_______________________
Column type frequency:
numeric	8
________________________
Group variables	None

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
vote_response	206	0.87	0.49	0.50	0	0	0	1	1	▇▁▁▁▇
educ	27	0.98	2.08	1.00	1	1	2	3	4	▇▇▁▃▂
partisan_strength	11	0.99	1.69	1.13	0	1	2	3	3	▅▅▁▆▇
polint	231	0.86	2.61	0.93	1	2	3	3	4	▃▅▁▇▃
poc	24	0.99	0.38	0.48	0	0	0	1	1	▇▁▁▁▅
age_group	72	0.96	2.92	1.40	1	2	3	4	5	▇▇▅▇▆
ideo3	46	0.97	2.08	0.85	1	1	2	3	3	▆▁▅▁▇
voted_20	0	1.00	0.00	0.00	0	0	0	0	0	▁▁▇▁▁

# here we see that more people in the data said they didnt vote
anes_data %>%
  count(vote_response)

## # A tibble: 3 x 2
##   vote_response            n
##   <dbl+lbl>            <int>
## 1  0 [0. Did not vote]   712
## 2  1 [1. Voted]          687
## 3 NA                     206

4. Estimate the model and fully interpret the results

Models

Going to run 3 models here one logit, one probit, and one weighted then I will compare the results and pick the best model to move forward with.

When looking at the comparison of the three models I notice that the logit model appears to be score the best when it comes to AIC, followed by the probit. However, the weighted logit actually performs the best here in terms of AIC and log likelihood. The coefficents remained pretty consistent between logit and probit with logit being about 2 times bigger consistently. There are three points where the significance goes away when comparing models and it is leaning partisan, liberal identification, and Age group. These three points show the impact of the weights on the significance of the Iv. Both move away from significance so I will use the weight logit model as I move forward. Very surprising to see age group get dispersed as a significant factor. Here we can start looking at some of those hypothesis questions and return some answers!

1 - Partisan_strength - People with a higher partisan strength are less likely to lie about voting. So here we notice that as partisan strength increases from weak to strong the coefficient grows large in the positive direction. Meaning people are less likely to lie about voting as partisan strength increases supporting the hypothesis. 2 - Education -People with a higher education level are less likely to lie about voting. Once more we notice as education increases so does the coefficient in the positive direction however, once we reach bachelors and advanced education it starts to level out. People with a bachelors and up are less likely to lie about voting compared to the highschool education. This supports the hypothesis above 3 Political interest - -People with more interest in politics are less likely to lie about voting. Political interest is a positive coefficient once more showing that as it increases people are less likely to lie about voting. This supports the hypothesis above.

For the models below in ideo I set the reference to 2 for moderate instead of it being liberal to make it easier to compare.

Final model to be used is the weighted logit model as it scores the best AIC and log likelihood.

logit <- glm(vote_response ~ as.factor(partisan_strength) + as.factor(educ) + polint + poc + relevel(as.factor(ideo3), ref = "2") + as.factor(age_group), data = anes_data, family = binomial(link = "logit"))


probit <- glm(vote_response ~ as.factor(partisan_strength) + as.factor(educ) + polint + poc + relevel(as.factor(ideo3), ref = "2") + as.factor(age_group), data = anes_data, family = binomial(link = "probit"))


logit_weight <- svyglm(V202109x ~ as.factor(partisan_strength) + as.factor(educ) + polint + poc + relevel(as.factor(ideo3), ref = "2") + as.factor(age_group), design = anes_weighted, family = binomial(link = "logit"))

a. Display the results in nice table format with appropriate names of your independent and dependent variables

stargazer(logit, probit, logit_weight, type = "text",
          dep.var.labels = "Voting Response",
          covariate.labels = c("Leaning Partisan", "Weak Partisan", "Strong Partisan", "Some College", "Bachelor", "Advanced school", "Political Interest", "People of Color", "Liberal Identification", "Conservative Identification", "Age(30-39)", "Age(40-49)", "Age(50-64)", "Age(65+)"), digits = 2)

## 
## =============================================================
##                                    Dependent variable:       
##                             ---------------------------------
##                              Voting Response     V202109x    
##                             logistic  probit  survey-weighted
##                                                  logistic    
##                               (1)      (2)          (3)      
## -------------------------------------------------------------
## Leaning Partisan             0.46**   0.27**       0.33      
##                              (0.20)   (0.12)      (0.27)     
##                                                              
## Weak Partisan                0.43**   0.25**      0.71**     
##                              (0.19)   (0.11)      (0.27)     
##                                                              
## Strong Partisan             1.14***  0.69***      1.15***    
##                              (0.19)   (0.11)      (0.25)     
##                                                              
## Some College                0.48***  0.28***      0.46**     
##                              (0.15)   (0.09)      (0.20)     
##                                                              
## Bachelor                    1.05***  0.64***      0.84***    
##                              (0.19)   (0.11)      (0.25)     
##                                                              
## Advanced school             1.03***  0.63***      0.87***    
##                              (0.22)   (0.13)      (0.26)     
##                                                              
## Political Interest          0.51***  0.31***      0.49***    
##                              (0.07)   (0.04)      (0.11)     
##                                                              
## People of Color              -0.09    -0.05        -0.07     
##                              (0.13)   (0.08)      (0.19)     
##                                                              
## Liberal Identification       0.34**   0.21**       0.23      
##                              (0.16)   (0.10)      (0.24)     
##                                                              
## Conservative Identification  -0.18    -0.11        -0.18     
##                              (0.16)   (0.09)      (0.19)     
##                                                              
## Age(30-39)                   -0.23    -0.14        -0.23     
##                              (0.19)   (0.11)      (0.28)     
##                                                              
## Age(40-49)                   -0.22    -0.13        -0.24     
##                              (0.21)   (0.13)      (0.32)     
##                                                              
## Age(50-64)                    0.26     0.16        0.25      
##                              (0.19)   (0.11)      (0.25)     
##                                                              
## Age(65+)                    0.69***  0.42***       0.44      
##                              (0.22)   (0.13)      (0.27)     
##                                                              
## Constant                    -2.54*** -1.53***    -2.66***    
##                              (0.29)   (0.17)      (0.40)     
##                                                              
## -------------------------------------------------------------
## Observations                 1,267    1,267        1,267     
## Log Likelihood              -752.77  -752.90      -722.71    
## Akaike Inf. Crit.           1,535.54 1,535.81    1,475.43    
## =============================================================
## Note:                             *p<0.1; **p<0.05; ***p<0.01

b. Graph the impact of the primary independent variables with 95% confidence intervals

graphs

Looking at the graph below we can see as education levels rise so does the change of lying about voting until it reaches bachelor and advanced where it levels out.

# Education (Here is where the issue is at)

new_data_logit <- expand.grid(educ = seq(1,4,1), #Education Graph
                              partisan_strength = "3",
                              polint = mean(anes_data$polint, na.rm = TRUE),
                              poc = mean(anes_data$poc, na.rm = TRUE),
                              age_group = "5",
                              ideo3 = "2")

logit_n <-predict(logit, newdata = new_data_logit,  type = "response", se.fit = TRUE) #Saves predicted probabilities

logit_n$ci_lower <- logit_n$fit - 1.96 * logit_n$se #Calculates lower bound
logit_n$ci_upper <- logit_n$fit + 1.96 * logit_n$se #Calculates upper bound
logit_n$x <- 1:4 #Sets the x variable
logit_n<-as.data.frame(logit_n) #Creates a data frame for graphing


ggplot(logit_n, aes(x = x, y = fit)) +
  geom_line(color = "black", linewidth=.5) +
  geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.5, fill =
                "grey") + # CI ribbon
  labs(x = "Education", y = "Predicted Probability", title = "Predicted Probability
with 95% CI") +
  theme_minimal() + scale_x_continuous(breaks = 1:4, labels = paste0(c("Highschool", "Some College", "Bachelor", "Advanced School"))) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "red")

Looking at the graph below we see as partisan strength increases from 1 to 2 a major jump in probability however, between 2 and 3 there is not much and then once again a spike between 3 and 4. I believe this is due to the middle group not having much of a difference between 2 and 3 (leaning vs weak)

# partisan strength
new_data_logit_ps <- expand.grid(partisan_strength = c("0", "1", "2", "3"), #Partisan Strength Graph
                              educ = "2",
                              polint = mean(anes_data$polint, na.rm = TRUE),
                              poc = mean(anes_data$poc, na.rm = TRUE),
                              age_group = "5",
                              ideo3 = "2")

logit_n <-predict(logit, newdata = new_data_logit_ps,  type = "response", se.fit = TRUE) #Saves predicted probabilities

logit_n$ci_lower <- logit_n$fit - 1.96 * logit_n$se #Calculates lower bound
logit_n$ci_upper <- logit_n$fit + 1.96 * logit_n$se #Calculates upper bound
logit_n$x <- 0:3 #Sets the x variable
logit_n<-as.data.frame(logit_n) #Creates a data frame for graphing


ggplot(logit_n, aes(x = x, y = fit)) +
  geom_line(color = "black", linewidth=.5) +
  geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.5, fill =
                "grey") + # CI ribbon
  labs(x = "Partisan Strength", y = "Predicted Probability", title = "Predicted Probability
with 95% CI") +
  theme_minimal() + scale_x_continuous(breaks = 0:3, labels = paste0(c("Independent", "Leaning Partisan", "Weak Partisan", "Strong Partisan"))) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "red")

Political interests almost looks linear with how consistent it increases the probability of lying about voting. This is actually really interesting to see that the more interested a person is in it then the more likely they are to lie about their voting. I wonder what is going on here and if there is some underlying reason.

# Political interest
new_data_logit_pi <- expand.grid(polint = seq(1,4,1), #Political Interest Graph
                              educ = "2",
                              partisan_strength = ("3"),
                              poc = mean(anes_data$poc, na.rm = TRUE),
                              age_group = "5",
                              ideo3 = "2")

logit_n <-predict(logit, newdata = new_data_logit_pi,  type = "response", se.fit = TRUE) #Saves predicted probabilities

logit_n$ci_lower <- logit_n$fit - 1.96 * logit_n$se #Calculates lower bound
logit_n$ci_upper <- logit_n$fit + 1.96 * logit_n$se #Calculates upper bound
logit_n$x <- 1:4 #Sets the x variable
logit_n<-as.data.frame(logit_n) #Creates a data frame for graphing


ggplot(logit_n, aes(x = x, y = fit)) +
  geom_line(color = "black", linewidth=.5) +
  geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.5, fill =
                "grey") + # CI ribbon
  labs(x = "Political Interest", y = "Predicted Probability", title = "Predicted Probability
with 95% CI") +
  theme_minimal() + scale_x_continuous(breaks = 1:4, labels = paste0(c("Not interested", "Not very interested", "Somewat interested", "Very interested"))) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "red")

5. Draw conclusions about the results based on your above hypotheses

a. What variables are important in terms of a person misrepresenting if they voted or not?

Partisan strength (IV)
Political interest (IV)
Education (IV)

b. Why do you think this is the case? Write a brief description of why.

When looking at partisan strength we notice right away it is consistently significant across all models except for on the leaning partisan on the weighted model. Strong partisan increased probability of lying about voting by 1.15 log odds at its steepest slope (~28.75% in methods of 4). Other factors slipped behind at about half the size of impact. The reason for this would be the stronger you feel about a particular party the more likely you want to be involved. Political interest was one that threw me off a little as it increases so do the odds maybe the more it matters the more political you get with your responses. This increased probability of lying about voting by .49(~12.25% method of 4) log odds at steepest slope. Education had a big impact which followed a pattern that the more educated you were the more likely you are to lie about voting. The odds go from .46 in highschool to .87 log odds at the steepest slope in advanced school. The more educated a person the more they understand societal perception and thus, may lie regarding voting. another reason may be that it is connected with age groups and as we see in the plot below that age increases in likely hood of lying about voting as it increases.

Accuracy of predictions

The actual number of yes is 49.41% which means if I said 1 for all variables I would obtain that accuracy level. However, with the model that I made it improved the accuracy to 68.35% which is a solid improvement. This is an improvement of 18.94%! It was interesting to see it improve this much. However, there is still room for improvement but overall, I am satisfied with these numbers. This means we have increased the accuracy of guessing whether or not someone is lying about voting.

###Classification Accuracy
anes_data <- anes_data %>%
  filter(complete.cases(.))

anes_data$predicted_prob<-predict(logit_weight, type="response")
anes_data$predicted_prob<-as.numeric(anes_data$predicted_prob)

#Create new vector with 0|1 predictions from the model
predicted_class <- ifelse(anes_data$predicted_prob >= 0.5, 1, 0)

# Compare the predicted binary outcomes to the actual y
actual_class <- anes_data$vote_response

# Calculate the classification accuracy
accuracy <- mean(predicted_class == actual_class)
print(accuracy)

## [1] 0.6835043

# Calculate the classification accuracy improvement
accuracy_improve <- accuracy-mean(anes_data$vote_response)
print(accuracy_improve)

## [1] 0.1894238

plot_summs(logit_weight)

Question 2 Ordered model

1. From list of variables, select three primary IVs that you theoretically think should influence a person’s opinion on gay marriage in the USA (note there are probably more than 3 variables you could theoretically justify here but just use 3 to keep it simpler)

Write a hypothesis for each of the three IVs you select

libcon - Hypothesis - The more liberal a person is the more likely they are to approve of gay marriage in the USA.
V201435 - Current religious identity - Hypothesis - Christianity is more likely to be more against the idea of gay marriage
V201601 - Sexual orientation - Hypothesis - Homosexual participants will show higher levels of approving gay marriage in the USA.

2. Decide what control variables you will include with brief description of why it is included

educ - The more educated a person the more exposed and diverse their perspective is.
age_group - older generation had a different culture growing up and thus, have a different belief system

3. Data cleaning

# pulling out whats needed for study
anes_gm <- data.frame(anes$V201416, anes$libcon, anes$V201435, anes$V201601, anes$educ, anes$age_group)
# renaming
gm_names <- c("gay_marriage", "libcon", "relig_ident", "sex_orient", "educ", "age_group")

colnames(anes_gm) <- gm_names



# changed scale to show more support for gay marriage with higher numbers.
anes_gm <- anes_gm %>%
  mutate(gay_marriage = case_when(
    gay_marriage == 1 ~ 3,
    gay_marriage == 2 ~ 2,
    gay_marriage == 3 ~ 1,
    gay_marriage >3 ~ NA_real_,
    gay_marriage <1 ~ NA_real_
  ))

# changed into to the char representation for the graphing
anes_gm <- anes_gm %>%
  mutate(libcon = case_when(
    libcon == 1 ~ "Extreme L",
    libcon == 2 ~ "Somewhat L",
    libcon == 3 ~ "Slightly L",
    libcon == 4 ~ "Moderate",
    libcon == 5 ~ "Slightly C",
    libcon == 6 ~ "Somewhat C",
    libcon == 7 ~ "Extreme C"
  ))
anes_gm <- anes_gm %>%
  mutate(relig_ident = case_when(
    relig_ident == 1 ~ "Monotheist",
    relig_ident == 2 ~ "Monotheist",
    relig_ident == 3 ~ "Monotheist",
    relig_ident == 4 ~ "Monotheist",
    relig_ident == 5 ~ "Monotheist",
    relig_ident == 6 ~ "Monotheist",
    relig_ident == 7 ~ "Polytheist",
    relig_ident == 8 ~ "Polytheist",
    relig_ident == 9 ~ "Non-Religous",
    relig_ident == 10 ~ "Non-Religous",
    relig_ident == 12 ~ "Non-Religous",
    relig_ident == 11 ~ "Other",
  ))

# remove NA values
anes_gm <- anes_gm %>%
  filter(complete.cases(.))

# looking here we notice that a majority of people support gay marriages (68.66%).
freq(anes_gm$gay_marriage)

## anes_gm$gay_marriage 
##       Frequency Percent
## 1           981   12.91
## 2          1400   18.43
## 3          5217   68.66
## Total      7598  100.00

# checking for any skewed or missing data
skim(anes_gm)

Data summary
Name	anes_gm
Number of rows	7598
Number of columns	6
_______________________
Column type frequency:
character	2
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
libcon	0	1	8	10	0	7	0
relig_ident	0	1	5	12	0	4	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
gay_marriage	1	2.56	0.71	1	2	3	3	3	▂▁▂▁▇
sex_orient	1	1.12	0.48	1	1	1	1	4	▇▁▁▁▁
educ	1	2.43	1.02	1	2	2	3	4	▅▇▁▆▅
age_group	1	3.39	1.38	1	2	4	5	5	▃▅▅▇▇

4. Estimate an ordered regression model using variables you selected above

a. Your decision if ordered logit or ordered probit

Going to be using the logit model with factors.

# changing vars to factor before model
anes_gm$gay_marriage <- factor(anes_gm$gay_marriage, ordered = TRUE)

        
# logistic model
gm_l <- polr(gay_marriage ~ libcon + relig_ident + sex_orient + educ + age_group, data = anes_gm, na.action = na.exclude, method = "logistic")

# probit model
gm_p <- polr(gay_marriage ~ libcon + relig_ident + sex_orient + educ + age_group, data = anes_gm, na.action = na.exclude, method = "probit")

anes_gm$educ_f <- factor(anes_gm$educ)
anes_gm$sex_orient_f <- factor(anes_gm$sex_orient)
anes_gm$age_f <- factor(anes_gm$age_group)
anes_gm$libcon_f <- relevel(factor(anes_gm$libcon), ref = "Moderate")
anes_gm$relig_ident_f <- relevel(factor(anes_gm$relig_ident), ref = "Non-Religous")

gm_p_f <- polr(gay_marriage ~ libcon_f + relig_ident_f + sex_orient + educ_f + age_f, data = anes_gm, na.action = na.exclude, method = "probit")
gm_l_f <- polr(gay_marriage ~ libcon_f + relig_ident_f + sex_orient + educ_f + age_f, data = anes_gm, na.action = na.exclude, method = "logistic")

stargazer(gm_l, gm_p, type = "text", digits = 3)

## 
## ====================================================
##                             Dependent variable:     
##                         ----------------------------
##                                 gay_marriage        
##                            ordered        ordered   
##                            logistic       probit    
##                              (1)            (2)     
## ----------------------------------------------------
## libconExtreme L            3.100***      1.785***   
##                            (0.230)        (0.118)   
##                                                     
## libconModerate             2.067***      1.226***   
##                            (0.111)        (0.066)   
##                                                     
## libconSlightly C           1.472***      0.891***   
##                            (0.110)        (0.066)   
##                                                     
## libconSlightly L           2.407***      1.424***   
##                            (0.121)        (0.071)   
##                                                     
## libconSomewhat C           0.803***      0.492***   
##                            (0.106)        (0.065)   
##                                                     
## libconSomewhat L           3.561***      2.014***   
##                            (0.162)        (0.085)   
##                                                     
## relig_identNon-Religous    0.809***      0.442***   
##                            (0.078)        (0.043)   
##                                                     
## relig_identOther          -0.266***      -0.176***  
##                            (0.069)        (0.040)   
##                                                     
## relig_identPolytheist       0.271          0.163    
##                            (0.270)        (0.150)   
##                                                     
## sex_orient                  0.121          0.055    
##                            (0.075)        (0.039)   
##                                                     
## educ                       0.195***      0.130***   
##                            (0.027)        (0.016)   
##                                                     
## age_group                 -0.171***      -0.094***  
##                            (0.021)        (0.012)   
##                                                     
## ----------------------------------------------------
## Observations                7,598          7,598    
## ====================================================
## Note:                    *p<0.1; **p<0.05; ***p<0.01

stargazer(gm_l_f, gm_p_f, type = "text", digits = 3)

## 
## ====================================================
##                             Dependent variable:     
##                         ----------------------------
##                                 gay_marriage        
##                            ordered        ordered   
##                            logistic       probit    
##                              (1)            (2)     
## ----------------------------------------------------
## libcon_fExtreme C         -2.074***      -1.230***  
##                            (0.111)        (0.066)   
##                                                     
## libcon_fExtreme L          1.043***      0.564***   
##                            (0.216)        (0.108)   
##                                                     
## libcon_fSlightly C        -0.595***      -0.335***  
##                            (0.080)        (0.046)   
##                                                     
## libcon_fSlightly L         0.341***      0.198***   
##                            (0.094)        (0.052)   
##                                                     
## libcon_fSomewhat C        -1.279***      -0.742***  
##                            (0.078)        (0.046)   
##                                                     
## libcon_fSomewhat L         1.500***      0.792***   
##                            (0.142)        (0.070)   
##                                                     
## relig_ident_fMonotheist   -0.811***      -0.444***  
##                            (0.078)        (0.043)   
##                                                     
## relig_ident_fOther        -1.073***      -0.616***  
##                            (0.089)        (0.049)   
##                                                     
## relig_ident_fPolytheist    -0.517*        -0.267*   
##                            (0.277)        (0.153)   
##                                                     
## sex_orient                  0.123          0.056    
##                            (0.075)        (0.039)   
##                                                     
## educ_f2                    0.338***      0.205***   
##                            (0.070)        (0.041)   
##                                                     
## educ_f3                    0.533***      0.345***   
##                            (0.079)        (0.046)   
##                                                     
## educ_f4                    0.563***      0.370***   
##                            (0.088)        (0.051)   
##                                                     
## age_f2                      -0.108        -0.037    
##                            (0.115)        (0.063)   
##                                                     
## age_f3                    -0.428***      -0.231***  
##                            (0.114)        (0.063)   
##                                                     
## age_f4                    -0.519***      -0.278***  
##                            (0.103)        (0.057)   
##                                                     
## age_f5                    -0.659***      -0.350***  
##                            (0.103)        (0.057)   
##                                                     
## ----------------------------------------------------
## Observations                7,598          7,598    
## ====================================================
## Note:                    *p<0.1; **p<0.05; ***p<0.01

stargazer(gm_l, gm_p, type = "text",
          dep.var.labels = "Gay Marriage",
          covariate.labels = c("Extreme Lib Identification", "Moderate Identification", "Slightly conserv Identification", "Slightly Lib Identification", "Somewhat Conserv Identification", "Somewhat Lib Identification", "Non-Religous", "Other religous belief", "Polytheist", "Sexual Orientation", "Education", "Age Group"), digits = 2)

## 
## ============================================================
##                                     Dependent variable:     
##                                 ----------------------------
##                                         Gay Marriage        
##                                    ordered        ordered   
##                                    logistic       probit    
##                                      (1)            (2)     
## ------------------------------------------------------------
## Extreme Lib Identification         3.10***        1.79***   
##                                     (0.23)        (0.12)    
##                                                             
## Moderate Identification            2.07***        1.23***   
##                                     (0.11)        (0.07)    
##                                                             
## Slightly conserv Identification    1.47***        0.89***   
##                                     (0.11)        (0.07)    
##                                                             
## Slightly Lib Identification        2.41***        1.42***   
##                                     (0.12)        (0.07)    
##                                                             
## Somewhat Conserv Identification    0.80***        0.49***   
##                                     (0.11)        (0.06)    
##                                                             
## Somewhat Lib Identification        3.56***        2.01***   
##                                     (0.16)        (0.08)    
##                                                             
## Non-Religous                       0.81***        0.44***   
##                                     (0.08)        (0.04)    
##                                                             
## Other religous belief              -0.27***      -0.18***   
##                                     (0.07)        (0.04)    
##                                                             
## Polytheist                           0.27          0.16     
##                                     (0.27)        (0.15)    
##                                                             
## Sexual Orientation                   0.12          0.06     
##                                     (0.08)        (0.04)    
##                                                             
## Education                          0.20***        0.13***   
##                                     (0.03)        (0.02)    
##                                                             
## Age Group                          -0.17***      -0.09***   
##                                     (0.02)        (0.01)    
##                                                             
## ------------------------------------------------------------
## Observations                        7,598          7,598    
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

stargazer(gm_l_f, gm_p_f, type = "text",
          dep.var.labels = "Gay Marriage (factors)",
          covariate.labels = c("Extreme conserv Identification", "Extreme Lib Identification", "Slightly conserv Identification", "Slightly Lib Identification", "Somewhat Conserv Identification", "Somewhat Lib Identification", "Monotheist", "Other religous belief", "Polytheist", "Sexual Orientation", "Some college", "Bachelor", "Advanced School", "Age(30-39)", "Age(40-49)", "Age(50-64)", "Age(65+)"), digits = 2)

## 
## ============================================================
##                                     Dependent variable:     
##                                 ----------------------------
##                                    Gay Marriage (factors)   
##                                     ordered       ordered   
##                                    logistic        probit   
##                                       (1)           (2)     
## ------------------------------------------------------------
## Extreme conserv Identification     -2.07***       -1.23***  
##                                     (0.11)         (0.07)   
##                                                             
## Extreme Lib Identification          1.04***       0.56***   
##                                     (0.22)         (0.11)   
##                                                             
## Slightly conserv Identification    -0.59***       -0.33***  
##                                     (0.08)         (0.05)   
##                                                             
## Slightly Lib Identification         0.34***       0.20***   
##                                     (0.09)         (0.05)   
##                                                             
## Somewhat Conserv Identification    -1.28***       -0.74***  
##                                     (0.08)         (0.05)   
##                                                             
## Somewhat Lib Identification         1.50***       0.79***   
##                                     (0.14)         (0.07)   
##                                                             
## Monotheist                         -0.81***       -0.44***  
##                                     (0.08)         (0.04)   
##                                                             
## Other religous belief              -1.07***       -0.62***  
##                                     (0.09)         (0.05)   
##                                                             
## Polytheist                          -0.52*         -0.27*   
##                                     (0.28)         (0.15)   
##                                                             
## Sexual Orientation                   0.12           0.06    
##                                     (0.08)         (0.04)   
##                                                             
## Some college                        0.34***       0.21***   
##                                     (0.07)         (0.04)   
##                                                             
## Bachelor                            0.53***       0.35***   
##                                     (0.08)         (0.05)   
##                                                             
## Advanced School                     0.56***       0.37***   
##                                     (0.09)         (0.05)   
##                                                             
## Age(30-39)                           -0.11         -0.04    
##                                     (0.11)         (0.06)   
##                                                             
## Age(40-49)                         -0.43***       -0.23***  
##                                     (0.11)         (0.06)   
##                                                             
## Age(50-64)                         -0.52***       -0.28***  
##                                     (0.10)         (0.06)   
##                                                             
## Age(65+)                           -0.66***       -0.35***  
##                                     (0.10)         (0.06)   
##                                                             
## ------------------------------------------------------------
## Observations                         7,598         7,598    
## ============================================================
## Note:                            *p<0.1; **p<0.05; ***p<0.01

# Brant test
brant(gm_l)

## ------------------------------------------------------------ 
## Test for         X2  df  probability 
## ------------------------------------------------------------ 
## Omnibus              116.07  12  0
## libconExtreme L      9.74    1   0
## libconModerate           4.09    1   0.04
## libconSlightly C     4.42    1   0.04
## libconSlightly L     5.87    1   0.02
## libconSomewhat C     0.09    1   0.76
## libconSomewhat L     7.69    1   0.01
## relig_identNon-Religous  7.58    1   0.01
## relig_identOther     19.5    1   0
## relig_identPolytheist        0.75    1   0.39
## sex_orient           0.38    1   0.54
## educ             19.46   1   0
## age_group            9.54    1   0
## ------------------------------------------------------------ 
## 
## H0: Parallel Regression Assumption holds

brant(gm_p)

## ------------------------------------------------------------ 
## Test for         X2  df  probability 
## ------------------------------------------------------------ 
## Omnibus              116.07  12  0
## libconExtreme L      9.74    1   0
## libconModerate           4.09    1   0.04
## libconSlightly C     4.42    1   0.04
## libconSlightly L     5.87    1   0.02
## libconSomewhat C     0.09    1   0.76
## libconSomewhat L     7.69    1   0.01
## relig_identNon-Religous  7.58    1   0.01
## relig_identOther     19.5    1   0
## relig_identPolytheist        0.75    1   0.39
## sex_orient           0.38    1   0.54
## educ             19.46   1   0
## age_group            9.54    1   0
## ------------------------------------------------------------ 
## 
## H0: Parallel Regression Assumption holds

brant(gm_l_f)

## ------------------------------------------------------------ 
## Test for         X2  df  probability 
## ------------------------------------------------------------ 
## Omnibus              122.15  17  0
## libcon_fExtreme C        3.9 1   0.05
## libcon_fExtreme L        4.71    1   0.03
## libcon_fSlightly C       0   1   0.95
## libcon_fSlightly L       0.58    1   0.45
## libcon_fSomewhat C       5.76    1   0.02
## libcon_fSomewhat L       2.57    1   0.11
## relig_ident_fMonotheist  7.42    1   0.01
## relig_ident_fOther       0.5 1   0.48
## relig_ident_fPolytheist  1.96    1   0.16
## sex_orient           0.39    1   0.53
## educ_f2              1.07    1   0.3
## educ_f3              8.46    1   0
## educ_f4              12.4    1   0
## age_f2               4.42    1   0.04
## age_f3               0.16    1   0.69
## age_f4               4.41    1   0.04
## age_f5               10.89   1   0
## ------------------------------------------------------------ 
## 
## H0: Parallel Regression Assumption holds

brant(gm_p_f)

## ------------------------------------------------------------ 
## Test for         X2  df  probability 
## ------------------------------------------------------------ 
## Omnibus              122.15  17  0
## libcon_fExtreme C        3.9 1   0.05
## libcon_fExtreme L        4.71    1   0.03
## libcon_fSlightly C       0   1   0.95
## libcon_fSlightly L       0.58    1   0.45
## libcon_fSomewhat C       5.76    1   0.02
## libcon_fSomewhat L       2.57    1   0.11
## relig_ident_fMonotheist  7.42    1   0.01
## relig_ident_fOther       0.5 1   0.48
## relig_ident_fPolytheist  1.96    1   0.16
## sex_orient           0.39    1   0.53
## educ_f2              1.07    1   0.3
## educ_f3              8.46    1   0
## educ_f4              12.4    1   0
## age_f2               4.42    1   0.04
## age_f3               0.16    1   0.69
## age_f4               4.41    1   0.04
## age_f5               10.89   1   0
## ------------------------------------------------------------ 
## 
## H0: Parallel Regression Assumption holds

5.Display and interpret the results

a. Display the results in nice table format with appropriate names of your independent and dependent variables

Interpretation - Looking at the brant results for the numeric model we notice that all IV’s are in violation of the brant test except sexual orientation, meaning we should consider using them as factor. We also noted that both the logit and probit model had the same results on the brant test meaning we can utilize either model. all Iv variable are not dichotomous so we need to convert them to factors in the coming analysis. Transformed variables into factor and they started to pass the brant test for the most part.

glm_fit <- function(model) {
  # Calculate Likelihood Ratio
  lr <- logLik(model)
 
  # Calculate AIC
  aic <- AIC(model)
 
  # Calculate BIC
  n <- nobs(model)
  p <- length(coef(model))
  bic <- -2 * logLik(model) + p * log(n)
 
  # Calculate Deviance
  deviance <- summary(model)$deviance

  # Return the metrics as a list
  metrics <- data.frame(Likelihood_Ratio = lr, AIC = aic, BIC = bic, Deviance = deviance)
  return(metrics)
}


glm_fit(gm_l)

##   Likelihood_Ratio      AIC      BIC Deviance
## 1        -5373.028 10774.06 10853.28 10746.06

# slight improvement of AIC BIC and deviance on factor version
glm_fit(gm_l_f)

##   Likelihood_Ratio      AIC      BIC Deviance
## 1        -5367.878 10773.76 10887.66 10735.76

b. Graph the impact of the three primary independent variables with 95% confidence intervals

Graphing libcon

Looking at extreme conservative vs extreme liberal. We notice that liberals are far more supportive of gay marriage in comparison to extreme conservatives.

logit<-ggpredict(gm_l_f, terms="libcon_f")   #ggpredict(model_name, terms="variable to graph")
print(logit) #Overview of the predicted probability results

## # Predicted probabilities of gay_marriage
## 
## # Response Level = 1
## 
## libcon_f   | Predicted |       95% CI
## -------------------------------------
## Moderate   |      0.04 | [0.03, 0.04]
## Extreme C  |      0.24 | [0.19, 0.29]
## Extreme L  |      0.01 | [0.01, 0.02]
## Slightly C |      0.07 | [0.05, 0.08]
## Slightly L |      0.03 | [0.02, 0.03]
## Somewhat C |      0.12 | [0.10, 0.15]
## Somewhat L |      0.01 | [0.01, 0.01]
## 
## # Response Level = 2
## 
## libcon_f   | Predicted |       95% CI
## -------------------------------------
## Moderate   |      0.10 | [0.08, 0.11]
## Extreme C  |      0.31 | [0.26, 0.38]
## Extreme L  |      0.04 | [0.02, 0.06]
## Slightly C |      0.15 | [0.13, 0.19]
## Slightly L |      0.07 | [0.06, 0.09]
## Somewhat C |      0.23 | [0.20, 0.28]
## Somewhat L |      0.02 | [0.02, 0.03]
## 
## # Response Level = 3
## 
## libcon_f   | Predicted |       95% CI
## -------------------------------------
## Moderate   |      0.87 | [0.85, 0.88]
## Extreme C  |      0.45 | [0.38, 0.52]
## Extreme L  |      0.95 | [0.92, 0.97]
## Slightly C |      0.78 | [0.74, 0.82]
## Slightly L |      0.90 | [0.88, 0.92]
## Somewhat C |      0.64 | [0.59, 0.69]
## Somewhat L |      0.97 | [0.95, 0.98]
## 
## Adjusted for:
## * relig_ident_f = Non-Religous
## *    sex_orient =         1.12
## *        educ_f =            1
## *         age_f =            1

logit$x <-as.factor(logit$x)
# Define custom levels and labels
custom_levels <- c(1, 2, 3)
custom_labels <- c("Anti-gay relationship", "No gay marriage", "Pro gay marriage")

# Reassign variable with custom labels
logit$response.level <- factor(logit$response.level, levels = custom_levels, labels = custom_labels)  

strong <- logit %>%
    filter(logit$x %in% c("Extreme L", "Extreme C"))  #Filters for the variables we need

  
  ggplot(strong, aes(x = x, y = predicted, fill = response.level)) +
      geom_bar(stat = "identity", position = "dodge", width = 0.7) +
      geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7)) +
      theme_minimal(base_size = 13) +
      labs(x = "Support for Gay Marriage", y = "Predicted Probability", 
           title = "Predicted Probability with Confidence Intervals") +
      scale_fill_manual(values = c("Anti-gay relationship" = "darkgrey", 
                                   "No gay marriage" = "lightgrey", 
                                   "Pro gay marriage" = "antiquewhite")) +
      labs(fill = "Ideological Identification") +
      theme(legend.position = c(0.05, 0.95), legend.justification = c(0, 1))

Sexual orientation

Comparing sexual orientation support for gay marriage even though it was not significant. They follow almost an identical line with each other where homosexuals are slightly higher.

logit2 <-ggpredict(gm_l_f, terms="sex_orient") #ggpredict(model_name, terms="variable to graph")
print(logit2) #Overview of the predicted probability results

## # Predicted probabilities of gay_marriage
## 
## # Response Level = 1
## 
## sex_orient | Predicted |       95% CI
## -------------------------------------
##          1 |      0.04 | [0.03, 0.04]
##          2 |      0.03 | [0.03, 0.05]
##          3 |      0.03 | [0.02, 0.05]
##          4 |      0.03 | [0.02, 0.05]
## 
## # Response Level = 2
## 
## sex_orient | Predicted |       95% CI
## -------------------------------------
##          1 |      0.10 | [0.09, 0.11]
##          2 |      0.09 | [0.07, 0.12]
##          3 |      0.08 | [0.05, 0.12]
##          4 |      0.07 | [0.04, 0.12]
## 
## # Response Level = 3
## 
## sex_orient | Predicted |       95% CI
## -------------------------------------
##          1 |      0.86 | [0.85, 0.88]
##          2 |      0.88 | [0.84, 0.91]
##          3 |      0.89 | [0.84, 0.93]
##          4 |      0.90 | [0.84, 0.94]
## 
## Adjusted for:
## *      libcon_f =     Moderate
## * relig_ident_f = Non-Religous
## *        educ_f =            1
## *         age_f =            1

logit2$x <-as.factor(logit2$x)
# Define custom levels and labels
custom_levels <- c(1, 2, 3)
custom_labels <- c("Anti-gay relationship", "No gay marriage", "Pro gay marriage")

# Reassign variable with custom labels
logit2$response.level <- factor(logit2$response.level, levels = custom_levels, labels = custom_labels) 


custom_levels <- c(1, 2)
custom_labels <- c("Heterosexual", "Homosexual")
logit2$x <- factor(logit2$x, levels = custom_levels, labels = custom_labels)  

orient <- logit2 %>%
    filter(logit2$x %in% c("Heterosexual", "Homosexual"))

  
    ggplot(orient, aes(x = x, y = predicted, fill = response.level)) +
      geom_bar(stat = "identity", position = "dodge", width = 0.7) +
      geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.4, position = position_dodge(width = 0.7)) +
      theme_minimal(base_size = 13) +
      labs(x = "Support for Gay Marriage", y = "Predicted Probability", 
           title = "Predicted Probability with Confidence Intervals") +
      scale_fill_manual(values = c("Anti-gay relationship" = "darkgrey", 
                                   "No gay marriage" = "lightgrey", 
                                   "Pro gay marriage" = "antiquewhite")) +
      labs(fill = "Sexual Orientation") +
      theme(legend.position = c(0.05, 0.95), legend.justification = c(0, 1))

Religous Identification

Comparing the different religious identifications support levels for gay marriage. Here we can visually see the non-religious group was the most supportive of gay marriage which falls into my hypothesis.

Multi vs ord mod

logit3 <-ggpredict(gm_l_f, terms="relig_ident_f")   #ggpredict(model_name, terms="variable to graph")
print(logit3) #Overview of the predicted probability results

## # Predicted probabilities of gay_marriage
## 
## # Response Level = 1
## 
## relig_ident_f | Predicted |       95% CI
## ----------------------------------------
## Non-Religous  |      0.04 | [0.03, 0.04]
## Monotheist    |      0.08 | [0.07, 0.10]
## Other         |      0.10 | [0.08, 0.13]
## Polytheist    |      0.06 | [0.04, 0.10]
## 
## # Response Level = 2
## 
## relig_ident_f | Predicted |       95% CI
## ----------------------------------------
## Non-Religous  |      0.10 | [0.08, 0.11]
## Monotheist    |      0.18 | [0.15, 0.21]
## Other         |      0.21 | [0.17, 0.25]
## Polytheist    |      0.14 | [0.09, 0.23]
## 
## # Response Level = 3
## 
## relig_ident_f | Predicted |       95% CI
## ----------------------------------------
## Non-Religous  |      0.87 | [0.85, 0.88]
## Monotheist    |      0.74 | [0.69, 0.78]
## Other         |      0.69 | [0.63, 0.74]
## Polytheist    |      0.79 | [0.69, 0.87]
## 
## Adjusted for:
## *   libcon_f = Moderate
## * sex_orient =     1.12
## *     educ_f =        1
## *      age_f =        1

logit3$x <-as.factor(logit3$x)
# Define custom levels and labels
custom_levels <- c(1, 2, 3)
custom_labels <- c("Anti-gay relationship", "No gay marriage", "Support gay marriage")

# Reassign variable with custom labels
logit3$response.level <- factor(logit3$response.level, levels = custom_levels, labels = custom_labels) 

custom_order <- c('Monotheist', 'Polytheist', 'Non-Religous', 'Other')

# Ordered model
plot1 <- ggplot(logit3, aes(x = x, y = predicted, color = response.level, group = response.level)) +
     geom_point() +   theme_minimal(base_size = 10) +
    labs(x = "Support for Gay Marriage", y = "Predicted Probability", 
         title = "Predicted Probability with Confidence Intervals") +
    labs(color = "Gay Marriage Opinion") +
    geom_errorbar(aes(ymin=conf.low, ymax=conf.high),
                  linewidth=.3,    # Thinner lines
                  width=.2)  + scale_x_discrete(limits = custom_order)

# Multinomial model
anes_gm$gay_marriage_f<-as.character(anes_gm$gay_marriage)

model <- multinom(gay_marriage_f ~  libcon_f + relig_ident_f + sex_orient + educ_f + age_f, data=anes_gm)

## # weights:  57 (36 variable)
## initial  value 8347.256169 
## iter  10 value 6125.050742
## iter  20 value 5490.568433
## iter  30 value 5326.087952
## iter  40 value 5308.712744
## final  value 5308.582327 
## converged

stargazer(model, type="text")

## 
## ====================================================
##                             Dependent variable:     
##                         ----------------------------
##                               2              3      
##                              (1)            (2)     
## ----------------------------------------------------
## libcon_fExtreme C         -0.870***      -2.745***  
##                            (0.156)        (0.164)   
##                                                     
## libcon_fExtreme L           -0.654        0.687**   
##                            (0.410)        (0.303)   
##                                                     
## libcon_fSlightly C          -0.204       -0.708***  
##                            (0.137)        (0.118)   
##                                                     
## libcon_fSlightly L          -0.056        0.313**   
##                            (0.173)        (0.146)   
##                                                     
## libcon_fSomewhat C         -0.276**      -1.534***  
##                            (0.128)        (0.115)   
##                                                     
## libcon_fSomewhat L          -0.227       1.372***   
##                            (0.279)        (0.227)   
##                                                     
## relig_ident_fMonotheist     0.112        -0.765***  
##                            (0.138)        (0.117)   
##                                                     
## relig_ident_fOther        -0.435***      -1.261***  
##                            (0.152)        (0.127)   
##                                                     
## relig_ident_fPolytheist     0.759          0.042    
##                            (0.580)        (0.541)   
##                                                     
## sex_orient                  -0.060         0.084    
##                            (0.129)        (0.104)   
##                                                     
## educ_f2                    0.274**       0.467***   
##                            (0.107)        (0.095)   
##                                                     
## educ_f3                    0.611***      0.833***   
##                            (0.125)        (0.114)   
##                                                     
## educ_f4                    0.778***      0.969***   
##                            (0.145)        (0.133)   
##                                                     
## age_f2                      0.383*         0.104    
##                            (0.196)        (0.166)   
##                                                     
## age_f3                      -0.057       -0.447***  
##                            (0.190)        (0.158)   
##                                                     
## age_f4                      0.147        -0.477***  
##                            (0.171)        (0.144)   
##                                                     
## age_f5                      0.238        -0.612***  
##                            (0.171)        (0.145)   
##                                                     
## Constant                    0.210        2.639***   
##                            (0.263)        (0.217)   
##                                                     
## ----------------------------------------------------
## Akaike Inf. Crit.         10,689.170    10,689.170  
## ====================================================
## Note:                    *p<0.1; **p<0.05; ***p<0.01

model_m<-ggpredict(model, terms="relig_ident_f") 

plot2 <- ggplot(model_m, aes(x = factor(x, level =c('Monotheist', 'Polytheist', 'Non-Religous', 'Other')), y = predicted, color = response.level, group = response.level)) +
    geom_point() +   theme_minimal(base_size = 15) +
    labs(x = "Multi model", y = "Predicted Probability", 
         title = "Predicted Probability with Confidence Intervals") +
    labs(color = "Gay Marriage Opinion")
plot1

plot2

6. Test the parallel regression assumption using the Brant test

Is it violated overall? For individual variables? Should you move to a multinomial modeling approach?

When running it without factors we noticed the overall was violated and most of the individual variables were violated except for sexual orientation. However, when flipping the other variables to factor and running it again we notice that most of them do not violate the Brant test. The probit and logit models performed the exact same way so choosing a model did not matter much here it mattered more on whether to use factors or change model.

When looking at the side by side we notice that they are very similar so I would use the ordered model in comparison to the multinomial model. If the results were different significantly I would use the multinomial model.

Homework 3

Ethan Campbell

2023-10-13

Question 1

1. Write a hypothesis for all three primary independent variables

2. Decide what control variables you will include with brief description of why it is included

3. Make sure to do any necessary data cleaning or preprocessing before you begin the analysis

Data work

4. Estimate the model and fully interpret the results

Models

a. Display the results in nice table format with appropriate names of your independent and dependent variables

b. Graph the impact of the primary independent variables with 95% confidence intervals

graphs

5. Draw conclusions about the results based on your above hypotheses

a. What variables are important in terms of a person misrepresenting if they voted or not?

b. Why do you think this is the case? Write a brief description of why.

Accuracy of predictions

Question 2 Ordered model

1. From list of variables, select three primary IVs that you theoretically think should influence a person’s opinion on gay marriage in the USA (note there are probably more than 3 variables you could theoretically justify here but just use 3 to keep it simpler)

2. Decide what control variables you will include with brief description of why it is included

3. Data cleaning

4. Estimate an ordered regression model using variables you selected above

a. Your decision if ordered logit or ordered probit

5.Display and interpret the results

a. Display the results in nice table format with appropriate names of your independent and dependent variables

b. Graph the impact of the three primary independent variables with 95% confidence intervals

Graphing libcon

Sexual orientation

Religous Identification

Multi vs ord mod

6. Test the parallel regression assumption using the Brant test