library(dplyr)load("~/Dropbox/github_repos/teaching/POL683_Fall24/advancedRegression/vignettes/dataActive.rda")dat = dataActive %>%mutate(support_protest =ifelse(violent >3, 1, 0) ) %>% dplyr::select(support_protest, VIOLENT)# Estimate logitlogitModel =glm(support_protest ~ VIOLENT, data = dat, family =binomial(link ="logit")) # Estimate probitprobitModel =glm(support_protest ~ VIOLENT, data = dat, family =binomial(link ="probit"))
Call:
glm(formula = support_protest ~ VIOLENT, family = binomial(link = "logit"),
data = dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.34186 0.04804 -7.115 1.12e-12 ***
VIOLENT -0.55257 0.07058 -7.829 4.92e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4671.4 on 3599 degrees of freedom
Residual deviance: 4609.4 on 3598 degrees of freedom
AIC: 4613.4
Number of Fisher Scoring iterations: 4
Call:
glm(formula = support_protest ~ VIOLENT, family = binomial(link = "probit"),
data = dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.21378 0.02992 -7.145 9.01e-13 ***
VIOLENT -0.33902 0.04316 -7.855 3.99e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4671.4 on 3599 degrees of freedom
Residual deviance: 4609.4 on 3598 degrees of freedom
AIC: 4613.4
Number of Fisher Scoring iterations: 4
Identical Predictions
# Create a datasetpred.data =expand.grid(VIOLENT =c(0, 1)) %>%as.data.frame()predictions_logit =predict(logitModel, pred.data, type ="response") %>%data.frame() %>%mutate(violent =c(0, 1) ) predictions_probit =predict(probitModel, pred.data, type ="response") %>%data.frame() %>%mutate(violent =c(0, 1) ) # The differencecat("The treatment effect for the logit model is:\n",predictions_logit[1,1] - predictions_logit[2,1], "\n","The treatment effect for the probit model is:\n",predictions_probit[1,1] - predictions_probit[2,1])
The treatment effect for the logit model is:
0.1251605
The treatment effect for the probit model is:
0.1251605
Nonlinear Regression
A different approach to the same probolem, a binary dependent variable
Call: glm(formula = support_protest ~ VIOLENT, family = binomial(link = "logit"),
data = dat)
Coefficients:
(Intercept) VIOLENT
-0.3419 -0.5526
Degrees of Freedom: 3599 Total (i.e. Null); 3598 Residual
Null Deviance: 4671
Residual Deviance: 4609 AIC: 4613
Simulation
It’s often useful to simulate the distribution of the predicted probabilities
We do this by first estimating our statistical model
Then develop a routine to examine “what happens” when we change the values of our independent variables
We can leverage this paradigm for other purpoposes, such as generating treatment effects, or marginal effects, or testing the robustness of our findings
Simulation, \(var(\theta)\)
Let’s do a few things here
We’ll use the variance-covariance matrix of the parameters to simulate the distribution of the predicted probabilities
Here’s how we’ll do it
The Variance Covariance Matrix
The Hessian \[H(\theta)={{\partial^2 ln L(\theta) }\over{\partial \theta \partial \theta^T}}\]
What does the second derivative tell us?
The negative of the expected value of the Hessian is the information matrix
\[I(\theta)=-E(H(\theta))\].
Finally, the inverse of the information matrix is the variance-covariance matrix for the parameters
\[V(\theta)=[-E(H(\theta))]^{-1}\].
If we have the variance-covariance matrix, we can simulate draws from a multivariate normal distribution
Uncertainty is captured by treating the parameters as random, not fixed
The design matrix is the matrix of the independent variables
The first column is a column of ones, the intercept
\(X\beta\) is the matrix multiplication of the design matrix and the coefficients
Post-Estimation
\(X\beta\) is the matrix multiplication of the design matrix and the coefficients
After estimation, we can generate our own design matrix, which we’ll use to generate predictions
For \(X\beta\) to have a solution, we need to have the same number of columns in the design matrix as we have coefficients (see matrix algebra chapter)
Rules of matrix multiplication. The number of columns in \(X\) must equal the number length of the coefficient vector
Conformable matrices
Plotly
The probability of support in the “violence” frame
Calculate the probability that \(y=1\) when \(x=0\)
Calculate the probability that \(y=1\) when \(x=1\)
Subtract (1) from (2)
Marginal Effects
We might simulate a confidence interval as well, by modifying the recipe
Calculate the probability that \(y=1\) when \(x=0\) by drawing \(K\) draws from the multivariate normal
Calculate the probability that \(y=1\) when \(x=1\) by drawing \(K\) draws from the multivariate normal
Subtract (1) from (2), which will now yield a distribution
Marginal Effects
Call:
glm(formula = support_protest ~ moral_individualism + sdo + moral_individualism:sdo,
family = binomial(link = "logit"), data = dat)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.8839 0.1290 6.852 7.28e-12 ***
moral_individualism -1.5124 0.2334 -6.481 9.13e-11 ***
sdo -5.2082 0.4632 -11.245 < 2e-16 ***
moral_individualism:sdo 4.9714 0.7543 6.591 4.38e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4671.4 on 3599 degrees of freedom
Residual deviance: 4427.8 on 3596 degrees of freedom
AIC: 4435.8
Number of Fisher Scoring iterations: 4
Mediation
Moral individualism, (“Any person who is willing to work hard has a good chance of succeeding”; “If people work hard, they almost always get what they want”)
Social dominance is a scale developed by Sidanius and Pratto (1999). (e.g.,“Some groups of people are simply inferior to other groups” ; “It’s probably a good thing that certain groups are at the top and other groups are at the bottom” )
Are moral individualists, and those who tolerate (or even prefer) group based inequality, more likely to support protesting a democratic election?
library(brms)n <-100x <-rnorm(n, 0, 1)y <-2+3* (x -mean(x)) +rnorm(n, 0, 1)data <-data.frame(x = x, y = y)# Define the model formulaformula <-bf(y ~ x -mean(x))# Specify the priorspriors <-c(prior(normal(0, 10), class ="b"),prior(normal(0, 10), class ="Intercept"),prior(uniform(0, 50), class ="sigma"))# Fit the modelfit <-brm(formula, data = data, prior = priors, chains =4, iter =2000, warmup =1000)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: y ~ x - mean(x)
Data: data (Number of observations: 100)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.61 0.11 1.40 1.81 1.00 3713 2767
x 2.95 0.11 2.73 3.17 1.00 3395 2596
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.06 0.08 0.91 1.23 1.00 4243 2987
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# Plot the resultsplot(fit)
Real Data
library(brms)load("~/Dropbox/github_repos/teaching/POL683_Fall24/advancedRegression/vignettes/dataActive.rda")dat = dataActive dat$authoritarianism =scale(dat$authoritarianism)dat$sdo =scale(dat$sdo)dat$college =ifelse(dat$educ =="4-year"| dat$educ =="Post-grad", 1, 0)formula <-bf(sdo ~ authoritarianism + college + authoritarianism:college)# Specify the priorspriors <-c(prior(normal(0, 10), class ="b"),prior(normal(0, 10), class ="Intercept"),prior(uniform(0, 100), class ="sigma"))# Fit the modelfit <-brm(formula, data = dat, prior = priors, chains =1, iter =2000, warmup =1000)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: sdo ~ authoritarianism + college + authoritarianism:college
Data: dat (Number of observations: 3599)
Draws: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 1000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 0.04 0.02 0.00 0.07 1.00 1189
authoritarianism 0.26 0.02 0.22 0.30 1.00 845
college -0.08 0.04 -0.15 -0.01 1.00 1024
authoritarianism:college 0.11 0.04 0.03 0.18 1.00 832
Tail_ESS
Intercept 682
authoritarianism 643
college 617
authoritarianism:college 709
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.95 0.01 0.93 0.98 1.00 1182 554
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# Plot the resultsplot(fit)
library(tidybayes)library(dplyr)library(ggplot2)library(modelr)# fit <- brm(formula, data = dat, prior = priors, chains = 1, iter = 2000, warmup = 1000)dat %>%data_grid( authoritarianism =seq_range(authoritarianism, n =30),college =c(0, 1)) %>%add_epred_draws(fit) %>%group_by(authoritarianism, college) %>%mutate(college =ifelse(college ==1, "College", "Less than College")) %>%summarize(.value =mean(.epred),.lower =quantile(.epred, 0.025),.upper =quantile(.epred, 0.975) ) %>%ggplot(aes(x = authoritarianism, y = .value, color =factor(college))) +geom_line() +geom_ribbon(aes(ymin = .lower, ymax = .upper), alpha =0.3, color="grey") +facet_wrap(~college) +labs(x ="Authoritarianism",y ="Social Dominance Orientation",color ="Education",title ="Predicted Social Dominance Orientation \nby Authoritarianism and Education" ) +theme_minimal()
Steps to Building a Model
Build a scientific model. This is the theoretical model that you believe underlies the data. This is often represented in a DAG
Build a statistical model. How might we use a statistical model to create estimands – parameters that we’ll estimate with our data?
Estimation, Simulation and Prediction. Here we’ll join the statistical and scientific model, using the data to test associations and causal processes
An example
The DAG
Here’s the simple model:
The Confound
The Fork
Here’s the simple model:
The Confound
# Imagine a population that in which 30% have a college or post-graduate degreeeducation =rbinom(1000, 1, 0.30)# predict personal wealthwealth =1+0.8*education +rnorm(1000, 0, 0.5)political_conservatism =1+0.3*wealth -0.5*education +rnorm(1000, 0.5)# Notice the changelm(political_conservatism ~ wealth) %>%summary()
Call:
lm(formula = political_conservatism ~ wealth)
Residuals:
Min 1Q Median 3Q Max
-3.3120 -0.6218 -0.0402 0.7117 3.1416
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.63804 0.07117 23.015 <2e-16 ***
wealth 0.08756 0.05215 1.679 0.0934 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.02 on 998 degrees of freedom
Multiple R-squared: 0.002817, Adjusted R-squared: 0.001818
F-statistic: 2.819 on 1 and 998 DF, p-value: 0.09345
Again, we might simulate data to examine the consequences of estimation. Again, the data reported previously were synthetic
If there is a confounder, we should condition on that confounder; otherwise, our results will be “biased”
Bias is just the difference between “truth” (how we set up the simulation) and the estimate
There are three other types of relationships to consider; here are two we often encounter: The mediator (the chain or pipe) and the collider (the inverted fork)
The Mediator (Pipe or Chain)
\(x\) can affect \(y\) in a couple ways
One is \(x\rightarrow m \rightarrow y\)
Another is \(x \rightarrow y\)
dagify(m ~x , y~ m + x) %>%ggdag() +theme_dag()
An Introduction to Causal Inference
The “treatment effect”
Causal explanations involve a statement about how units are influenced by a particular treatment or intervention
The \(\textbf{counterfactual}\): How would an individual respond to a treatment, compared to an identical circumstance where that individual did not receive the treatment (Morton and Williams 2010)?
An Introduction to Causal Inference
For this reason, the exercise involves inferences about unit effects
\[\delta_i = Y_{i,1}-Y_{i,0}\]
An Introduction to Causal Inference
The counterfactual is unobservable
The Fundamental Problem of Causal Inference.
\(\delta_i\) is not attainable, because the potential outcome is never directly observed!
An Introduction to Causal Inference
Groups of individuals – or units – where some degree of control is exercised
Imagine two worlds (datasets); one where everyone has “high wealth” and one where everyone has “low wealth”
Use thise data as the design matrix. Then we perhaps just average the predicted score, political conservatism in two “worlds,” one where everyone is “high wealth” and the other where everyone is “low wealth”
Use these data “worlds,” in conjunction with the statistical model to create predictions and causal estimates
Call: glm(formula = conservative ~ age, family = binomial("logit"),
data = dat)
Coefficients:
(Intercept) age
-2.43048 0.03126
Degrees of Freedom: 3598 Total (i.e. Null); 3597 Residual
(1 observation deleted due to missingness)
Null Deviance: 4278
Residual Deviance: 4046 AIC: 4050
Call: glm(formula = conservative ~ college + age, family = binomial("logit"),
data = dat)
Coefficients:
(Intercept) college age
-2.39842 -0.35369 0.03288
Degrees of Freedom: 3598 Total (i.e. Null); 3596 Residual
(1 observation deleted due to missingness)
Null Deviance: 4278
Residual Deviance: 4028 AIC: 4034
datYoung <- dat %>%# replace all in age with 20mutate(age =20) predYoung =predict(model_age, newdata = datYoung, type ="response") datOld <- dat %>%# replace all in age with 80mutate(age =80) predOld =predict(model_age, newdata = datOld, type ="response")cat("The effect of going from 20 to 80 years old leads to ",round( predOld %>%mean() - predYoung %>%mean(),2)*100, "percentage point increase in identifying as conservative\n")
The effect of going from 20 to 80 years old leads to 38 percentage point increase in identifying as conservative
load("~/Dropbox/github_repos/teaching/POL683_Fall24/advancedRegression/vignettes/dataActive.rda")# the effect of going from 20 to 80; datCollege <- dat %>%# replace all in age with 20mutate(college =1) %>%# redundant, but just to be clearmutate(age = age)predCollege =predict(model_education, newdata = datCollege, type ="response") datNocollege <- dat %>%# replace all in age with 20mutate(college =0) %>%# redundant, but just to be clearmutate(age = age)predNone =predict(model_age, newdata = datNocollege, type ="response")cat("The effect of going from no college to college years old leads to ",abs(round( predCollege %>%mean() - predNone %>%mean(),2)*100), "percentage point decrease in identifying as conservative\n")
The effect of going from no college to college years old leads to 4 percentage point decrease in identifying as conservative
Appendix: Matrix Algebra Review
Matrix representation of a system of equations
Easier to visualize the relationship between the model and the data
And tedious calculations are efficiently represented
Vectors are matrices with one single column or row
Vectors are denoted with lowercase letters b
Some Operations
If two matrices are equal, \(\textbf{A}=\textbf{B}\), for \(i=j,\forall i,j\)
A \(\textbf{symmetric}\) matrix has the same off-diagonal matrix
Squaring a matrix is the same as multiplying it by itself, \(\textbf{A}^2=\textbf{A} \textbf{A}\)
An \(\textbf{idempotent}\) matrix means that if we multiply a matrix by itself, this product is the original matrix, or \(\textbf{A}^2=\textbf{A} \textbf{A}= \textbf{A}\)
Some More Operations
– An \(\textbf{identity}\) matrix, \(\textbf{I}\) is like to multiplying a scalar by 1. So, \(\textbf{A}I= \textbf{A}\).