Interpreting Coefficients in Logistic Regression.

Background

In work and school, I am working on several problems dealing with logistic and Poisson regression. One question that comes up when dealing with these models is how to interpret a single unit of change in one of the predictors in terms of the response. It is important to note that when dealing with a change in a single predictor, we make the assumtion that all the other predictors in the model are held constant.

The Basic Logistic Regression Model

As a bit of background, logistic regression is a regression model that attempts to find a model that fits a data set with a binary response variable. The forumlation of the model looks like this:

\[ g(p) = ln(\frac{p}{1-p}) = \beta_0+\beta_1x_1+...+\beta_px_p \]

In the above, \(ln(\frac{p}{1-p})\) is referred to as the logit function of p, and the value that it respresents is called the log odds. Recall that the term “odds” here comes from probability and refers to the ratio of the probability of success over the probability of failure (or 1-success).

Now, you may be wondering why we bother to use the logit function at all, and not just go with a simple regression model, or a simple regression model with a log function applied to the response variable. The answer has two parts. The first part has to do with the assumption of constant variance for standard and multiple linear regression, which I won’t get into here. The second reason is… well… you actually can use a “shortcut” and do a log transform of the response variable, but I am not going to get into that here either. The main point of this exercise is to figure out in the equation above, how a single unit change in one of the \(x_i\) values affects the results of the model.

Creating Some Data…

To illustrate this, we will use a very simple model with a single predictor and an intercept. The graph data below represents the people between the ages of 0 to 80, and if they like to listen to the Frank Sinatra and Count Basie recording, “Live at the Sands.” Well, actually that is a lie. I made the data up… but it could look like that… maybe…

Anyway, let’s create some data…

rm(list=ls())
set.seed(56489231)
df = data.frame(x=as.double(), y=as.integer())
names(df) <- c('x','y')
df <- rbind(df, data.frame(x=rnorm(50,3,1)*10, y=rep(0,50)))
df <- rbind(df, data.frame(x=rnorm(50,6,1)*10, y=rep(1,50)))
plot(df$x, df$y, xlab="Age", ylab="Likes Frank and The Count", main="(Pseudo) Frank Sinatra and Count Basie Live at the Sand")

Above, the response variable, Y, is boolean. Either people like the record (1), or they don’t (0)…. yet.

Next we will fit a model using the generalized linear model function glm in R and specifying a family of “binomial.” This results in a logistic regression model… despite the fact that the word logistic doesn’t even appear.

par(mfrow=c(2,2))
mdl <- glm(y~x, data=df, family="binomial")
plot(mdl)

summary(mdl)

## 
## Call:
## glm(formula = y ~ x, family = "binomial", data = df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.26381  -0.25177  -0.00461   0.18743   2.12275  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -10.89116    2.26623  -4.806 1.54e-06 ***
## x             0.24409    0.05159   4.731 2.23e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 138.629  on 99  degrees of freedom
## Residual deviance:  43.053  on 98  degrees of freedom
## AIC: 47.053
## 
## Number of Fisher Scoring iterations: 7

b0 <- mdl$coefficients[1]
b1 <- mdl$coefficients[2]

The 4 graphs above are diagnostic plots that tell us if our model is a good fit. Right now, they don’t really help us much because our data didn’t include replications… but, again, I won’t get into that here. What is most important is that we have fit a logistic regression model to our data. The following graph helps us visualize what we have done.

library(ggplot2)
xt <- seq(0,max(df$x),length.out=100)
new_data <- data.frame(x=xt, y=-1)
preds <- predict(mdl, newdata=new_data)
mdl_pred <- data.frame(xt, preds, exp(preds), exp(preds)/(1+exp(preds)))
names(mdl_pred) <- c('x','log_odds','odds','prob')

par(mfrow=c(3,1))
p=ggplot() +
  geom_line(data=mdl_pred, aes(x=x, y=prob), col="blue") +
  geom_point(data=df, aes(x=x, y=y))
p

In the above graph, we have a nice blue line that preresents our model’s response variable for a range of age inputs. That curve represents the modeled probability of a person of age (x) liking “Live at the Sands.” To determine it, you simply follow your age up from the x-axis. When you hit the curve, move horizontally over to the left and you’ll see the resulting probability.

You’ll note that, for the most part, younger people don’t care for it, and a more mature, seasoned and wise audience with a well-developed sense of taste likes it. You’ll also notice that there is overlap. For instance, if you’re around 45 you may or may not like the recording. Maybe, if you’re 45, you had a really good band teacher in high school that turned you on to jazz… or maybe you were to busy playing sports.

The dots on the x axis and on y=1 represent the responses of our simulated peopl of different ages.

Anyway, our model takes all of the data into account and attempts to construct the best fit curve it can. More on that a bit later.

The blue line can be interpreted as a function that looks like this:

\[ y = \frac{e^{-10.8911579 + 0.2440871*x}}{1+e^{-10.8911579 + 0.2440871*x}} \]

Below we display the output of the model as generated by the R programming language…

summary(mdl)

## 
## Call:
## glm(formula = y ~ x, family = "binomial", data = df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.26381  -0.25177  -0.00461   0.18743   2.12275  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -10.89116    2.26623  -4.806 1.54e-06 ***
## x             0.24409    0.05159   4.731 2.23e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 138.629  on 99  degrees of freedom
## Residual deviance:  43.053  on 98  degrees of freedom
## AIC: 47.053
## 
## Number of Fisher Scoring iterations: 7

log_odds <- mdl$coefficients[2]
odds_factor <- exp(log_odds)
print(odds_factor)

##        x 
## 1.276456

Using our formula above, we calculate a prediction for how much someone that is 45 years old will like the timeless class “Live at the Sands”. The answer to that question is a probability of…

x <- 45
log_odds_x <- b0+(x*b1)
odds_x <- exp(log_odds_x)
prob_x <- odds_x/(1+odds_x)
p=ggplot() +
  geom_line(data=mdl_pred, aes(x=x, y=prob)) +
  geom_point(data=df, aes(x=x, y=y)) +
  geom_point(aes(x=45, y=prob_x), colour="red", size=5) +
  annotate("point", x = 45, y = prob_x, colour = "blue")
p

print(as.double(prob_x))

## [1] 0.5231738

On the graph above, we can see that the probability for x=45 is 0.5231738. You may have noticed in the code or on the graph that we were initially talking about odds, and now were are referring to probabilities. We can easily convert an odds value ino a probability as follows:

\[ prob = \frac{odds}{1+odds} \]

Mathematical Exploration of Unit Change

Cool. Now, let’s think for a moment about what would happen if we made a change in the predictor variable age. Specifically, what would happen if a person was 1 year older?

Lets’s start with the odds of the existing function at \(x_{old}\):

\[ ln(odds_{old}) = {\beta_0+\beta_1x_1} \]

We want to find some \(odds_{new}\) for \(x_1+1\)

\[ ln(odds_{new})=\beta_0+\beta_1(x_1+1) \] Multiplying this out, we get the following:

\[ln(odds_{new})=\beta_0+\beta_1x_1+\beta_1\]

Applying the exponential function to both sides, we get…

\[ odds_{new} = e^{\beta_0+\beta_1x_1+\beta_1} \] Recall that

\[ odds_{old} = e^{\beta_0+\beta_1x_1} \]

Combinding the above two equations, we get…

\[ odds_{new} = (odds_{old})(e^{\beta_1}) \]

In other words, we take the log of the coefficient \(\hat\beta_1\) and multiply it times our old odds, and that will give us the new odds with one unit of change in \(x_1\).

A note about the “hat”. In logistic regression, we want to find parameters \(\beta_0\) and \(\beta_1\) that maximize the fit of our curve to the data. There is no closed form (easy) solution to finding those values. What we (or our statistical software) does is use maximum likelihood estimation or MLEs to approximate those values. In the end, what we end up with are approximations for \(\beta_0\) and \(\beta_1\). We signify the approximate nature of those values with the hat. For example: \(\hat\beta_i\). Hence we have \(\hat\beta_0\) and \(\hat\beta_1\). Please forgive my somewhat cavalier approach to this notational detail.

A Graphical Interpretation

Let’s try this out graphically and see if it works…

x <- 45
log_odds_x <- b0+(x*b1)
odds_x <- exp(log_odds_x)

# From the factor we calculated above...
new_odds_x <- odds_x * odds_factor

prob_x <- odds_x/(1+odds_x)
new_prob_x <- new_odds_x/(1+new_odds_x)

p=ggplot() +
  geom_line(data=mdl_pred, aes(x=x, y=prob)) +
  geom_point(data=df, aes(x=x, y=y)) +
  geom_point(aes(x=x, y=prob_x), colour="red", size=3) +
  geom_point(aes(x=x+1, y=new_prob_x), colour="green", size=3) 
  
p

Above I picked an x value of 45. At that value of x, the model produced a value 1.0972004. That value has been converted from an odds ratio into a probability by taking \(\frac{odds_{old}}{1+odds_{old}}\) or 0.5231738 as we discussed above..

To estimate the value of the new odds for one unit of change in x keeping all other variables constant we do the following:

\[ odds_{new} = odds_{old}*e^{\hat\beta_1} = 1.0972004 * e^{ 0.2440871 } = 1.4005275 \]

So the new probablity is:

\[ prob_{new} = \frac{odds_{new}}{1+odds_{new}} = 0.5834249 \]

Graphically, we see that the little green dot appears on the curve to the right of the little red dot. The red dot is the old probability (at age 45), and the green dot is the new probability (at age 46).

More Graphs

To see this is not a fluke, we can do the same computation for various values of x and see that the results are the same. In other words, the green dot follows along the curve 1 unit to the right of the red dot.

x <- 10
log_odds_x <- b0+(x*b1)
odds_x <- exp(log_odds_x)

# From the factor we calculated above...
new_odds_x <- odds_x * odds_factor

prob_x <- odds_x/(1+odds_x)
new_prob_x <- new_odds_x/(1+new_odds_x)

p=ggplot() +
  geom_line(data=mdl_pred, aes(x=x, y=prob)) +
  geom_point(data=df, aes(x=x, y=y)) +
  geom_point(aes(x=x, y=prob_x), colour="red", size=3) +
  geom_point(aes(x=x+1, y=new_prob_x), colour="green", size=3) 
  
p

x <- 75
log_odds_x <- b0+(x*b1)
odds_x <- exp(log_odds_x)

# From the factor we calculated above...
new_odds_x <- odds_x * odds_factor

prob_x <- odds_x/(1+odds_x)
new_prob_x <- new_odds_x/(1+new_odds_x)

p=ggplot() +
  geom_line(data=mdl_pred, aes(x=x, y=prob)) +
  geom_point(data=df, aes(x=x, y=y)) +
  geom_point(aes(x=x, y=prob_x), colour="red", size=3) +
  geom_point(aes(x=x+1, y=new_prob_x), colour="green", size=3) 
  
p

Conclusion

In this notebook, I have shown a very simple example of how to interpret coefficients in logistic regression model. The interpretation of coefficients is very different in logic regression (and similarly in Poisson regression) that standard or multiple linear regression.

If you haven’t heard the classic “Live at the Sands” by Frank Sinatra and the Count Basie Big Band, be sure to check it out. It is one of the all time classic recordings of the big band era… And if you don’t like it. Our model clearly shows you’ll grow to love it.

I hope you have enjoyed this post! Wish me luck on my midterm tomorrow. :)