Logistic regression in action

Eitan Tzelgov

29/11/2022

Setup

The goal of the analysis presented below is to help you understand how to estimate and interpret logistic regression models. The data is taken from work I have done with Dr Delia Dumitrescu.

For the purpose of this exercise, below I present two models. examining the relationship between respondents’ 2015 election vote and some measures of their ideological position (these will be the independent variables), and their vote in the EU referendum of 2016 (the dependent variable). In other words, the logistic regression we will model the log-odds of voting for Brexit, conditional on ideology and how they voted in the election. In even other words, we want to understand the relationships between party voting and ideological positions and the probability of wanting the leave the EU. Why? Because political scientists are interested in understanding the structure of political belief systems, and how they affect political choices.

Research questions

Theoretical expectations

Logistic regrssion basics 1

Remember, the dependent variable in logistic regression models is binary (it has only two outcomes, in our case, leave or remain). First, I estimate and summarise a basic model with only partisanship as the independent variable (I have limited the sample to Labour and Conservative party voters, for simplicity).

A very basic model: Partisanship

mod1<-(glm(ref.vote~elec.vote, family=binomial(link=logit), data=dat_mse))
summary(mod1)
## 
## Call:
## glm(formula = ref.vote ~ elec.vote, family = binomial(link = logit), 
##     data = dat_mse)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6274  -1.0159   0.7864   0.7864   1.3480  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -0.3926     0.1795  -2.187   0.0287 *  
## elec.voteLabour   1.4076     0.2224   6.329 2.47e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 559.68  on 425  degrees of freedom
## Residual deviance: 518.02  on 424  degrees of freedom
## AIC: 522.02
## 
## Number of Fisher Scoring iterations: 4

Intepretation

So, what do the results mean? First, the intercept. Much like in the linear regression model, the intercept tells us what the value of the dependent variable would be when all the independent variables equal zero. In this case, it tells us that the log-odds of voting to remain if the respondent has voted in the previous election is -0.39.

What you want to do is transform this quantity to probability. It’s easy

Log-odds, odds, probabilities

exp(coef(mod1)["(Intercept)"])
## (Intercept) 
##   0.6753247

All I did there is exponentiate -0.39. This gives me the odds of voting Remain for respondents who had voted Conservative. In order to get from odds to probabilities, we can you the basic rule:

\(\text{probability}= \frac{\text{odds}}{1+\text{odds}}\)

In our case, \(p=\frac{0.67}{1+0.67}\), so \(p=0.40\). This means that, in our sample, 40% of the people who voted Conservative had voted Remain.

Does partisanship matter?

Yes!. To get predicted probabilities, we could plug the coefficients from the model into this function:

\(\text{probability}=\frac{1}{exp^{-(\alpha+\beta_1\times \text{Labour})}}\)

Instead, we can use ’s function to do it for us. First, we generate a new data frame (I call it here). We then set the variable(s) to the value(s) we are interested in. In this case, I simply set .

newdata = data.frame(elec.vote="Labour")

We then use the predict function, and set

predict(mod1, newdata, type="response")
##         1 
## 0.7340067

Thus, the predicted probability of voting remain for Labour voters in our model is 0.73.

A slightly more complicated model

So far, we have learned that partisanship has a significant (and substantively large) effect on the probability of voting to remain. What about our second hypothesis? Below, I present a model in which I also include respondents views on immigration (1-10 scale, higher values indicated more negative views), and economic redistribution (again, higher values indicate negative views).

Estimation

mod2<-(glm(ref.vote~elec.vote+immig+distr, family=binomial(link=logit), data=dat_mse))
summary(mod2)
## 
## Call:
## glm(formula = ref.vote ~ elec.vote + immig + distr, family = binomial(link = logit), 
##     data = dat_mse)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3246  -0.9196   0.5087   0.8005   1.8263  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      1.846561   0.406948   4.538 5.69e-06 ***
## elec.voteLabour  0.785991   0.260665   3.015  0.00257 ** 
## immig           -0.322320   0.043721  -7.372 1.68e-13 ***
## distr           -0.008199   0.046774  -0.175  0.86085    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 559.68  on 425  degrees of freedom
## Residual deviance: 451.23  on 422  degrees of freedom
## AIC: 459.23
## 
## Number of Fisher Scoring iterations: 4

Intepretation and context

The results are interesting, but perhaps unsurprising if you’ve been following British politics. Partisanship is still significant, and immigration plays a significant factor in explaining respondents’ voting in the referendum (see the value). However, views of economic redistribution do not have any relationship with the referendum vote. So our hypothesis is partially supported, and the results fit with some literature implying that cultural factors (especially views on immigration) and not economic factors, have led to the victory of the leave campaign.

Ideology and Predictions

What about the predicted probabilities implied by our model? We can use the predict function to simulate any type of voter. Suppose we wanted to know what the probability of voting Remain for a Conservative respondent who holds very negative views towards immigrants? I create a data frame with the corresponding values below (note that I set the economic redistribution variable to its mean, the \(\texttt{elec.vote}\) to conservative and the variable to its highest value, which is the most anti-immigrant).

newdata = data.frame(elec.vote="conservative", immig=10,
                     distr=mean(dat_mse$distr))

I can now calculate the predicted probability using the function:

predict(mod2, newdata, type="response")
##         1 
## 0.1968311

More predictions

What about a pro-immigration Labour voter?

newdata = data.frame(elec.vote="Labour", immig=1, distr=mean(dat_mse$distr))
predict(mod2, newdata, type="response")
##         1 
## 0.9072617

There you have it. Pro-immigration liberal Labour voters are much more likely to have voted remain than Conservative voters who are anti-immigration.

Plotting this:

#load the ggplot2 library
library(ggplot2)
mod2<-(glm(ref.vote~elec.vote+immig+distr, family=binomial(link=logit),
           data=dat_mse))
newdata= data.frame(elec.vote="conservative",
                    immig=seq(1,10,1),
                    distr=mean(dat_mse$distr))
# get predicted probabilities
preds <- predict(mod2, newdata, type="response")
#  Prepare a data frame for plotting
plot_data <-  data.frame(newdata, preds=preds)
ggplot(plot_data, aes(x = immig, y = preds)) +
  geom_line(size = 1, alpha = 0.4) +
  labs(x = "Views on Immigration (high=more negative)", 
       y = "Probability of voting Remain in the
       2016 referendum", 
       title = "Immigration Views and Proability
       of Voting Remain among Conservative Voters")

Conservatives

Labour

Summary

We have examined the relationship between partisanship, ideology and voting in the 2016 EU referendum. Using a logistic regression, we modeled the probability of voting to remain. The results tell us that partisanship and cultural views help us predict referendum voting, whereas economic ideology does not.