Introduction

The purpose of this document is to explore the interpretation of coefficients for logist and regression models.

Logistic Regression Model

For this example, we would like to demonstrate how to interpret coefficients of a logistic regression model. The model that we will use will be the standard glm() model in R. For this demonstration, we want a final model that has the form:

\[ g(p) = ln(\frac{p}{1-p}) = \hat\beta_0 + \hat\beta_1x_1 + \hat\beta_2x_2 \]

The above model is the standard regression function in three dimensions.

We will simulate the model above with the following formula

library(plotly)

## Loading required package: ggplot2

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

df <- data.frame(x1=as.double(), x2=as.double(), z=as.integer(), ptype=as.integer())

for (b1 in seq(-10, 10, by=1)) {
  for (b2 in seq(-10, 10, by=1)) {
    
    z <- as.integer(round(1 / ( 1 + (exp(b1+rnorm(1)) * exp(b2)))))
    
    df <- rbind(df, c(b1, b2, z, 0))
  }
}
colnames(df) <- c('b1','b2','z',"ptype")
summary(df)

##        b1            b2            z              ptype  
##  Min.   :-10   Min.   :-10   Min.   :0.0000   Min.   :0  
##  1st Qu.: -5   1st Qu.: -5   1st Qu.:0.0000   1st Qu.:0  
##  Median :  0   Median :  0   Median :0.0000   Median :0  
##  Mean   :  0   Mean   :  0   Mean   :0.4921   Mean   :0  
##  3rd Qu.:  5   3rd Qu.:  5   3rd Qu.:1.0000   3rd Qu.:0  
##  Max.   : 10   Max.   : 10   Max.   :1.0000   Max.   :0

head(df)

p <- plot_ly(df, x = ~b1, y=~b2, z=~z)
p

## No trace type specified:
##   Based on info supplied, a 'scatter3d' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter3d

## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

Ok. So good enough. Now, let’s see if we can fit a logistic regression model to this thing…

mdl <- glm(z~b1+b2, data=df, family=binomial)

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(mdl)

## 
## Call:
## glm(formula = z ~ b1 + b2, family = binomial, data = df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.91386  -0.00131   0.00000   0.00179   1.42716  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.3855     0.3391  -1.137    0.256    
## b1           -2.2569     0.4265  -5.292 1.21e-07 ***
## b2           -2.2774     0.4299  -5.298 1.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 611.24  on 440  degrees of freedom
## Residual deviance:  59.09  on 438  degrees of freedom
## AIC: 65.09
## 
## Number of Fisher Scoring iterations: 10

c <- mdl$coefficients[2]
x1_log_odds <- c
x1_odds <- exp(x1_log_odds)
pcnt <- x1_odds / (1+x1_odds)
old_p = 0.625
old_odds <- 0.625/0.375
new_odds <- x1_odds*old_odds
new_p <- new_odds / (1+new_odds)

cat("\nCoefficient: ", c, "\n")

## 
## Coefficient:  -2.256881

cat("\nLog odds: ", x1_log_odds, "\n")

## 
## Log odds:  -2.256881

cat("Odds: ", x1_odds, "\n")

## Odds:  0.1046765

cat("New Odds: ", new_p, '\n')

## New Odds:  0.1485454

cat("Pcnt change: ", pcnt, "\n")

## Pcnt change:  0.09475759

Ok. Based on the above, we have a model here to our slightly over-dispersed data. If we want to mess with coefficients, lets try a few things.

For a 1 unit positive change in x1, we have a log odds change of -2.2568808 (this is just the coefficient) holding all other predictors constant. That means that we have a 0.1046765 change in the value of the odds. To figure that as a percentage, we convert that number from odds to a percentage:

\[ \frac{odds}{1+odds} \] As an example, let’s say that our model output for a given input is 0.625. Therefore the odds are $ 0.625/0.375 = 1.667 $ Now, if we have a 1 unit change in predictor that goes along with $\beta_1$, our coefficient for x1,

Applying a 1 unit change to the predictor associated with $\beta_1$ (namely $x_1$ ), we would expect to see the following:

Coefficient	Log odds	Odds Factor	Odds (given)	New Odds	Probability (Given)	New Probability
-2.2568808	-2.2568808	0.1046765	1.6666667	0.1744608	0.625	0.1485454

So, what does this mean?

Let’s superimpose our model predictions with our model training data…

df2 <- data.frame(x1=as.double(), x2=as.double(), z=as.integer(), t=as.integer())

for (b1 in seq(-10.5, 10.5, by=1)) {
  for (b2 in seq(-10.5, 10.5, by=1)) {
    
    #z <- as.integer(round(1 / ( 1 + (exp(b1+rnorm(1)) * exp(b2)))))
    dat <- data.frame("b1"=b1,"b2"=b2)
    z <- predict(mdl, newdata=dat, type="response")
    df2 <- rbind(df2, c(b1, b2, z, 1))
  }
}
colnames(df2) <- c('b1','b2','z','ptype')
head(df2)

df3 <- rbind(df,df2)
p <- plot_ly(df3, x = ~b1, y=~b2, z=~z, color=~ptype, size=.5, colors = c('#BF382A', '#0C4B8E'))
p

## No trace type specified:
##   Based on info supplied, a 'scatter3d' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter3d

## No scatter3d mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

In the plot, we can see blue and red dots. The red dots are the original training data. The blue dots are the fitted mode. Notice as we move more positive on the $\beta_1$ axis, the more likely it is that we will become negative. The predictions predictions that we ran through above, in fact, are showing us that if we move 1 unit in the positive $\beta_1$ direction, the odds of being positive change by a factor of 0.1046765. Also, if our olds odds were 1.6666667, then our new odds of being positive will be 0.1744608.

Logistic Regression Interpreting Coefficients

Miles Porter

March 9, 2020

Introduction

Logistic Regression Model

Conclusion