Introduction

The purpose of this document is to explore the interpretation of coefficients for logist and regression models.

Logistic Regression Model

For this example, we would like to demonstrate how to interpret coefficients of a logistic regression model. The model that we will use will be the standard glm() model in R. For this demonstration, we want a final model that has the form:

\[ g(p) = ln(\frac{p}{1-p}) = \hat\beta_0 + \hat\beta_1x_1 + \hat\beta_2x_2 \]

The above model is the standard regression function in three dimensions.

We will simulate the model above with the following formula

library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
df <- data.frame(x1=as.double(), x2=as.double(), z=as.integer(), ptype=as.integer())

for (b1 in seq(-10, 10, by=1)) {
  for (b2 in seq(-10, 10, by=1)) {
    
    z <- as.integer(round(1 / ( 1 + (exp(b1+rnorm(1)) * exp(b2)))))
    
    df <- rbind(df, c(b1, b2, z, 0))
  }
}
colnames(df) <- c('b1','b2','z',"ptype")
summary(df)
##        b1            b2            z              ptype  
##  Min.   :-10   Min.   :-10   Min.   :0.0000   Min.   :0  
##  1st Qu.: -5   1st Qu.: -5   1st Qu.:0.0000   1st Qu.:0  
##  Median :  0   Median :  0   Median :0.0000   Median :0  
##  Mean   :  0   Mean   :  0   Mean   :0.4921   Mean   :0  
##  3rd Qu.:  5   3rd Qu.:  5   3rd Qu.:1.0000   3rd Qu.:0  
##  Max.   : 10   Max.   : 10   Max.   :1.0000   Max.   :0
head(df)
p <- plot_ly(df, x = ~b1, y=~b2, z=~z)
p
## No trace type specified:
##   Based on info supplied, a 'scatter3d' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter3d
## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

Ok. So good enough. Now, let’s see if we can fit a logistic regression model to this thing…

mdl <- glm(z~b1+b2, data=df, family=binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mdl)
## 
## Call:
## glm(formula = z ~ b1 + b2, family = binomial, data = df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.91386  -0.00131   0.00000   0.00179   1.42716  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.3855     0.3391  -1.137    0.256    
## b1           -2.2569     0.4265  -5.292 1.21e-07 ***
## b2           -2.2774     0.4299  -5.298 1.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 611.24  on 440  degrees of freedom
## Residual deviance:  59.09  on 438  degrees of freedom
## AIC: 65.09
## 
## Number of Fisher Scoring iterations: 10
c <- mdl$coefficients[2]
x1_log_odds <- c
x1_odds <- exp(x1_log_odds)
pcnt <- x1_odds / (1+x1_odds)
old_p = 0.625
old_odds <- 0.625/0.375
new_odds <- x1_odds*old_odds
new_p <- new_odds / (1+new_odds)

cat("\nCoefficient: ", c, "\n")
## 
## Coefficient:  -2.256881
cat("\nLog odds: ", x1_log_odds, "\n")
## 
## Log odds:  -2.256881
cat("Odds: ", x1_odds, "\n")
## Odds:  0.1046765
cat("New Odds: ", new_p, '\n')
## New Odds:  0.1485454
cat("Pcnt change: ", pcnt, "\n")
## Pcnt change:  0.09475759

Ok. Based on the above, we have a model here to our slightly over-dispersed data. If we want to mess with coefficients, lets try a few things.

For a 1 unit positive change in x1, we have a log odds change of -2.2568808 (this is just the coefficient) holding all other predictors constant. That means that we have a 0.1046765 change in the value of the odds. To figure that as a percentage, we convert that number from odds to a percentage:

\[ \frac{odds}{1+odds} \] As an example, let’s say that our model output for a given input is 0.625. Therefore the odds are $ 0.625/0.375 = 1.667 $ Now, if we have a 1 unit change in predictor that goes along with \(\beta_1\), our coefficient for x1,

Applying a 1 unit change to the predictor associated with \(\beta_1\) (namely \(x_1\) ), we would expect to see the following:

Coefficient Log odds Odds Factor Odds (given) New Odds Probability (Given) New Probability
-2.2568808 -2.2568808 0.1046765 1.6666667 0.1744608 0.625 0.1485454

So, what does this mean?

Let’s superimpose our model predictions with our model training data…

df2 <- data.frame(x1=as.double(), x2=as.double(), z=as.integer(), t=as.integer())

for (b1 in seq(-10.5, 10.5, by=1)) {
  for (b2 in seq(-10.5, 10.5, by=1)) {
    
    #z <- as.integer(round(1 / ( 1 + (exp(b1+rnorm(1)) * exp(b2)))))
    dat <- data.frame("b1"=b1,"b2"=b2)
    z <- predict(mdl, newdata=dat, type="response")
    df2 <- rbind(df2, c(b1, b2, z, 1))
  }
}
colnames(df2) <- c('b1','b2','z','ptype')
head(df2)
df3 <- rbind(df,df2)
p <- plot_ly(df3, x = ~b1, y=~b2, z=~z, color=~ptype, size=.5, colors = c('#BF382A', '#0C4B8E'))
p
## No trace type specified:
##   Based on info supplied, a 'scatter3d' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter3d
## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

In the plot, we can see blue and red dots. The red dots are the original training data. The blue dots are the fitted mode. Notice as we move more positive on the \(\beta_1\) axis, the more likely it is that we will become negative. The predictions predictions that we ran through above, in fact, are showing us that if we move 1 unit in the positive \(\beta_1\) direction, the odds of being positive change by a factor of 0.1046765. Also, if our olds odds were 1.6666667, then our new odds of being positive will be 0.1744608.

Conclusion

In this writeup, I showed how changes in coefficients in a logistic regression model can be interpreted in the context of a 3D graph.

Thanks!

Miles.