library("tidyverse")
library("dplyr")
library("htmltools")
library("forcats")
library("stargazer")
library("sjPlot")
library("car")
Returning to the graphs in the preliminary analysis from Homework8_9 and looking for variables that might be better explained by a quadratic term does not yield any candidates. For the purposes of answering the problem, I’ll pick the % of the population with 300 min of exercise of more per week (act300), and determine whether states with moderate percentages of Republican identified individuals show higher numbers of those at this activity level and whether this percentage decreases for states with larger percentages of Republican identified individuals.
RObs300Fru <- lm(R ~ wgtObs + conFru + act300, Pty_Hlth)
RObs300sqFru <- lm(R ~ wgtObs + conFru + act300 + I(act300^2), Pty_Hlth)
sjt.lm(RObs300sqFru)
R | ||||
B | CI | p | ||
(Intercept) | -5.57 | -121.00 – 109.85 | .923 | |
wgtObs | 0.71 | -0.24 – 1.66 | .138 | |
conFru | 0.65 | -0.10 – 1.39 | .087 | |
act300 | -0.56 | -7.25 – 6.14 | .868 | |
I(act300^2) | 0.01 | -0.09 – 0.12 | .776 | |
Observations | 51 | |||
R2 / adj. R2 | .312 / .253 |
sjt.lm(RObs300Fru)
R | ||||
B | CI | p | ||
(Intercept) | -21.00 | -60.24 – 18.23 | .287 | |
wgtObs | 0.70 | -0.23 – 1.64 | .137 | |
conFru | 0.66 | -0.07 – 1.39 | .075 | |
act300 | 0.39 | -0.25 – 1.04 | .226 | |
Observations | 51 | |||
R2 / adj. R2 | .311 / .267 |
As expected based on the graphical inspection, exploring act300 as a quadratic term did not better explain the variable’s interaction with the dependent variable. The \(\beta\) is near 0 and did not switch sign indicating it does not decline in states with higher % of Republican identified individuals. The adjusted \(R^2\) value has also decreased because of the addition of this quadratic term that does not fit the data.
Based on the findings in Q11 from Homework8_9 there appears to be an interactive effect between states with higher % of individuals living a sedentary lifestyle, and not consuming fruit in one’s diet and the % of individuals identifying as Republican. We will also look at a possible interaction between wgtObs & act0 due to the logical association between the two.
RObsFru0 <- lm(R ~ wgtObs + conFru + act0, Pty_Hlth)
RObsFru0_F0 <- lm(R ~ wgtObs + conFru + act0 + conFru * act0, Pty_Hlth)
RObsFru0_O0 <- lm(R ~ wgtObs + conFru + act0 + wgtObs * act0, Pty_Hlth)
sjt.lm(RObsFru0, RObsFru0_F0, RObsFru0_O0)
R | R | R | ||||||||||
B | CI | p | B | CI | p | B | CI | p | ||||
(Intercept) | -1.25 | -18.85 – 16.35 | .887 | -50.02 | -163.41 – 63.38 | .379 | -33.20 | -138.11 – 71.72 | .527 | |||
wgtObs | 0.80 | -0.12 – 1.71 | .086 | 0.75 | -0.17 – 1.67 | .108 | 1.69 | -1.35 – 4.73 | .269 | |||
conFru | 0.87 | 0.11 – 1.63 | .027 | 2.09 | -0.82 – 5.01 | .155 | 1.02 | 0.11 – 1.93 | .029 | |||
act0 | -0.71 | -1.46 – 0.03 | .060 | 1.05 | -3.06 – 5.15 | .611 | 0.46 | -3.41 – 4.32 | .813 | |||
conFru:act0 | -0.04 | -0.14 – 0.05 | .385 | |||||||||
wgtObs:act0 | -0.04 | -0.17 – 0.09 | .537 | |||||||||
Observations | 51 | 51 | 51 | |||||||||
R2 / adj. R2 | .341 / .299 | .352 / .296 | .347 / .290 |
With the tables side-by-side the results are somewhat confusing. When the lack of fruit consumption is interacting with sedentary lifestyle, most notable is the sign shift of act0 alone from -.71 to 1.05 with a concurrent drastic change in the y-intercept from -1.25 to -50.02, the only negative values remaining in the interaction model are the interaction itself and the y-intercept. We see similar changes in the interaction between wgtObs & act0 with slight successive decreases in fit (adj \(R^2\)) with the interaction models. The best explanation I can come up with for this phenomena is that in the reduced model without the interactions, the y-intercept indicates we start with states with a moderate % of Republicans, and states with more sedentary individuals is correlated with fewer % of the population as Republicans perhaps due to high density, urban coastal areas that are often liberal and the combination of convenience and lack of accessible outdoor recreation inclines people towards sedentary lifestyles, explaining the negative correlation. With the addition of interactive terms, the y-intercept jumps into very liberal territory at -50.02 and sedentary lifestyle has a far less significant but positive correlation due to it’s being fitted to a linear model that includes the confluence of lack of fruit consumption and sedentariness (indicating poor diet in addition to sedentary behavior) and being obese and sedentary respectively. Why the interaction terms are negative remains baffling, though the confidence intervals show that they span 0 and are statistically insignificant and thus may not be all too important. The lack of fruit being positively correlated with could be explained by the midwestern states where monocropped grains and factory farmed animals make up a large proportion of the agricultural product and therefore diet in contrast to places like Hawaii, California and coastal areas where fruit is more abundant.
It’s apparent what the result is going to be given the Adjusted \(R^2\) values, but for the sake of practice the model with interaction between wgtObs & act0 will be tested against the reduced (original) model. \[F = \frac{(R_c^2 - R_r^2) / df_1}{(1-R_c^2)/df_2}\]
rr_org <- summary(RObsFru0)$r.squared
rc_int <- summary(RObsFru0_O0)$r.squared
(f <- ((rc_int - rr_org)/1)/((1 - rc_int)/(51 - 4 - 1)))
## [1] 0.3867096
anova(RObsFru0, RObsFru0_O0)
## Analysis of Variance Table
##
## Model 1: R ~ wgtObs + conFru + act0
## Model 2: R ~ wgtObs + conFru + act0 + wgtObs * act0
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 47 2352.1
## 2 46 2332.5 1 19.609 0.3867 0.5371
The \(F_{stat}\) and \(P_{value}\) from the F-test indicate that the interaction model was not an improvement over the original model towards a better fit for the explaining the dependent variable in a statistically significant way.
Using the anes_2008tr.csv dataset in Course Resources, model vote_rep (whether the respondent voted Republican in the last election) as a function of age, race, income, and ideology.
anes <- read_csv(file = "anes_2008tr.csv")
Rep_AgeRacIncIde <- glm(vote_rep ~ age + race_white + income + ideology_con, family = "binomial",
anes)
sjt.glm(Rep_AgeRacIncIde)
vote_rep | ||||
Odds Ratio | CI | p | ||
(Intercept) | 0.00 | 0.00 – 0.00 | <.001 | |
age | 1.00 | 1.00 – 1.01 | .258 | |
race_white | 11.43 | 8.28 – 16.01 | <.001 | |
income | 1.50 | 1.30 – 1.74 | <.001 | |
ideology_con | 2.83 | 2.50 – 3.24 | <.001 | |
Observations | 1539 |
I’m unsure as to whether the probability this question is asking for is best represented by the interacting terms model, or by a prediction based on the original model, so I’ve used both methods. Intuitively, I would guess that the prediction is what is being asked for.
Repint_rWhAgeIncIde <- glm(vote_rep ~ age * race_white * income * ideology_con, family = "binomial",
anes)
sjt.glm(Repint_rWhAgeIncIde)
vote_rep | ||||
Odds Ratio | CI | p | ||
(Intercept) | 0.15 | 0.00 – 209.35 | .682 | |
age | 0.92 | 0.79 – 1.13 | .396 | |
race_white | 0.01 | 0.00 – 1476.87 | .415 | |
income | 0.56 | 0.04 – 16.88 | .718 | |
ideology_con | 0.53 | 0.09 – 4.81 | .544 | |
age:race_white | 1.11 | 0.88 – 1.35 | .331 | |
age:income | 1.02 | 0.95 – 1.08 | .587 | |
race_white:income | 2.45 | 0.05 – 81.87 | .640 | |
age:ideology_con | 1.02 | 0.98 – 1.06 | .323 | |
race_white:ideology_con | 6.55 | 0.48 – 70.19 | .143 | |
income:ideology_con | 1.35 | 0.64 – 2.59 | .410 | |
age:race_white:income | 0.97 | 0.91 – 1.06 | .512 | |
age:race_white:ideology_con | 0.98 | 0.93 – 1.03 | .331 | |
age:income:ideology_con | 1.00 | 0.98 – 1.01 | .577 | |
race_white:income:ideology_con | 0.77 | 0.34 – 1.87 | .546 | |
age:race_white:income:ideology_con | 1.00 | 0.99 – 1.02 | .577 | |
Observations | 1539 |
input <- data.frame(race_white = 1, age = mean(anes$age), income = mean(anes$income),
ideology_con = mean(anes$ideology_con))
(Wh <- predict.glm(Repint_rWhAgeIncIde, input, type = "response"))
## 1
## 0.4924732
input$race_white <- 0
(dp_BtoWh <- Wh - (nWh <- predict.glm(Repint_rWhAgeIncIde, input, type = "response")))
## 1
## 0.4057051
(dodds_BtoWh <- Wh/nWh)
## 1
## 5.675739
The change in probability is +~.41 between a black person and a white person going to vote Republican, or in terms of odds, a white person is ~5.68 times more likely to vote Republican than a black person.
\[\begin{aligned} \hat{P}(y=1|x=a) - \hat{P}(y=1|x=b) = \textrm{invlogit}(a) - \textrm{invlogit}(b) =\\\frac{e^{\beta_0 + \beta_1 a_1 + \beta_2 a_2 + \beta_3 a_3 + \beta_4 a_4}}{1+e^{\beta_0 + \beta_1 a_1 + \beta_2 a_2 + \beta_3 a_3 + \beta_4 a_4}} - \frac{e^{\beta_0 + \beta_1 b_1 + \beta_2 b_2 + \beta_3 b_3 + \beta_4 b_4}}{1+e^{\beta_0 + \beta_1 b_1 + \beta_2 b_2 + \beta_3 b_3 + \beta_4 b_4}} \end{aligned}\] \[\begin{aligned} \frac{e^{\beta_0 + \beta_1 a}}{1+e^{\beta_0 + \beta_1 a}} - \frac{e^{\beta_0 + \beta_1 b}}{1+e^{\beta_0 + \beta_1 b}} \\ \frac{e^{-8.077 + 0.005(46.451)+2.437(1)+0.405(2.705)+1.042(4.098)}}{1+e^{-8.077 + 0.005(46.451)+2.437(0)+0.405(2.705)+1.042(4.098)}} - \frac{e^{-8.077 + 0.005(46.451)+2.437(0)+0.405(2.705)+1.042(4.098)}}{1+e^{-8.077 + 0.005(46.451)+2.437(0)+0.405(2.705)+1.042(4.098)}} \end{aligned}\] Using R to solve that craziness…
b_yint <- Rep_AgeRacIncIde$coef[1]
b_age <- Rep_AgeRacIncIde$coef[2]
b_rWh <- Rep_AgeRacIncIde$coef[3]
b_inc <- Rep_AgeRacIncIde$coef[4]
b_ide <- Rep_AgeRacIncIde$coef[5]
(`1cdodds_BtoWh` <- (exp(b_yint + b_age * input[2] + b_rWh * 1 + b_inc * input[3] +
b_ide * input[4])/(1 + exp(b_yint + b_age * input[2] + b_rWh * 1 + b_inc * input[3] +
b_ide * input[4]))) - (exp(b_yint + b_age * input[2] + b_rWh * 0 + b_inc * input[3] +
b_ide * input[4])/(1 + exp(b_yint + b_age * input[2] + b_rWh * 0 + b_inc * input[3] +
b_ide * input[4]))))
## age
## 1 0.4096499
dage_lw <- input
dage_hi <- input
dinc_lw <- input
dinc_hi <- input
dage_lw[2] <- input[2] - 25
dage_hi[2] <- input[2] + 25
dinc_lw[3] <- input[3] - 1
dinc_hi[3] <- input[3] + 1
(`50yrs` <- predict.glm(Repint_rWhAgeIncIde, dage_hi, type = "response") - predict.glm(Repint_rWhAgeIncIde,
dage_lw, type = "response"))
## 1
## 0.03083505
(inc_1 <- predict.glm(Repint_rWhAgeIncIde, dinc_hi, type = "response") - predict.glm(Repint_rWhAgeIncIde,
dinc_lw, type = "response"))
## 1
## 0.116571
inc_1/`50yrs`
## 1
## 3.78047
If you are of non-white race, the change in the probability of voting Republican from an age of 21.5 to 71.5 is +~3%, whereas the the change in probability of voting Republican from one income bracket below average to one income bracket above average is +~11%. In other words, a non-white person is ~3.78 times more likely to vote Republican by moving up two income brackets than by aging 50 years.
dage_lw[1] <- 1
dage_hi[1] <- 1
dinc_lw[1] <- 1
dinc_hi[1] <- 1
(`50yrswh` <- predict.glm(Repint_rWhAgeIncIde, dage_hi, type = "response") - predict.glm(Repint_rWhAgeIncIde,
dage_lw, type = "response"))
## 1
## 0.01946875
(inc_1wh <- predict.glm(Repint_rWhAgeIncIde, dinc_hi, type = "response") - predict.glm(Repint_rWhAgeIncIde,
dinc_lw, type = "response"))
## 1
## 0.1293148
inc_1wh/`50yrswh`
## 1
## 6.642172
If you are white, the change in the probability of voting Republican from an age of 21.5 to 71.5 is +~2%, whereas the the change in probability of voting Republican from one income bracket below average to one income bracket above average is +~13%. In other words, a white person is ~6.64 times more likely to vote Republican by moving up two income brackets than by aging 50 years.
Rep_AgeRacGenEduIncIdePid <- glm(vote_rep ~ age + race_white + gender_male + education +
income + ideology_con + partyid_rep, family = "binomial", anes)
sjt.glm(Rep_AgeRacGenEduIncIdePid, Rep_AgeRacIncIde)
vote_rep | vote_rep | |||||||
Odds Ratio | CI | p | Odds Ratio | CI | p | |||
(Intercept) | 0.00 | 0.00 – 0.00 | <.001 | 0.00 | 0.00 – 0.00 | <.001 | ||
age | 1.02 | 1.00 – 1.03 | .005 | 1.00 | 1.00 – 1.01 | .258 | ||
race_white | 5.09 | 3.43 – 7.62 | <.001 | 11.43 | 8.28 – 16.01 | <.001 | ||
gender_male | 0.87 | 0.60 – 1.26 | .458 | |||||
education | 1.02 | 0.90 – 1.16 | .761 | |||||
income | 1.28 | 1.06 – 1.55 | .010 | 1.50 | 1.30 – 1.74 | <.001 | ||
ideology_con | 1.67 | 1.41 – 1.98 | <.001 | 2.83 | 2.50 – 3.24 | <.001 | ||
partyid_rep | 2.45 | 2.19 – 2.75 | <.001 | |||||
Observations | 1539 | 1539 |
Rep_AgeRacGenEduIncIdePid$terms
## vote_rep ~ age + race_white + gender_male + education + income +
## ideology_con + partyid_rep
## attr(,"variables")
## list(vote_rep, age, race_white, gender_male, education, income,
## ideology_con, partyid_rep)
## attr(,"factors")
## age race_white gender_male education income ideology_con
## vote_rep 0 0 0 0 0 0
## age 1 0 0 0 0 0
## race_white 0 1 0 0 0 0
## gender_male 0 0 1 0 0 0
## education 0 0 0 1 0 0
## income 0 0 0 0 1 0
## ideology_con 0 0 0 0 0 1
## partyid_rep 0 0 0 0 0 0
## partyid_rep
## vote_rep 0
## age 0
## race_white 0
## gender_male 0
## education 0
## income 0
## ideology_con 0
## partyid_rep 1
## attr(,"term.labels")
## [1] "age" "race_white" "gender_male" "education"
## [5] "income" "ideology_con" "partyid_rep"
## attr(,"order")
## [1] 1 1 1 1 1 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
## attr(,"predvars")
## list(vote_rep, age, race_white, gender_male, education, income,
## ideology_con, partyid_rep)
## attr(,"dataClasses")
## vote_rep age race_white gender_male education
## "numeric" "numeric" "numeric" "numeric" "numeric"
## income ideology_con partyid_rep
## "numeric" "numeric" "numeric"
sumCom <- summary(Rep_AgeRacGenEduIncIdePid)
sjt.lm(Rep_AgeRacGenEduIncIdePid, Rep_AgeRacIncIde)
vote_rep | vote_rep | |||||||
B | CI | p | B | CI | p | |||
(Intercept) | -8.65 | -9.87 – -7.50 | <.001 | -8.08 | -9.01 – -7.19 | <.001 | ||
age | 0.01 | 0.00 – 0.03 | .005 | 0.00 | -0.00 – 0.01 | .258 | ||
race_white | 1.63 | 1.23 – 2.03 | <.001 | 2.44 | 2.11 – 2.77 | <.001 | ||
gender_male | -0.14 | -0.51 – 0.23 | .458 | |||||
education | 0.02 | -0.11 – 0.15 | .761 | |||||
income | 0.25 | 0.06 – 0.44 | .010 | 0.40 | 0.26 – 0.55 | <.001 | ||
ideology_con | 0.51 | 0.35 – 0.68 | <.001 | 1.04 | 0.92 – 1.18 | <.001 | ||
partyid_rep | 0.89 | 0.79 – 1.01 | <.001 | |||||
Observations | 1539 | 1539 | ||||||
R2 / adj. R2 | .525 / .729 | .402 / .558 |
partyid_rep
is the party ID of the respondent, with Republican being the higher value
I could not find in the ANES documentation what variable and or question on the survey this was matched to, which would help to better explain it.