Probability, Odds & Logistic Regression

Harold Nelson

4/8/2018

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats

Probability and Odds

\[ Odds = \frac{p}{1-p} \]

\[p = \frac{Odds}{1+Odds}\]

Graphical View

p = seq(from=.01,to=.999,by=.001)
odds = p/(1-p)
df = data.frame(p,odds)
df$logodds = log(df$odds)

df %>% ggplot(aes(p,odds)) + geom_point()

df %>% ggplot(aes(odds,p)) + geom_point()

df %>% ggplot(aes(p,logodds)) + geom_point()

Model Relationships.

The linear probability model may be stated as

\[p = b_0+b_1*x \]

The linear probability can produce probabilities outside the range of zero to one, so we assume the logistic regression model instead.

The logistic regression model places the log of the odds on the left hand side.

\[log\left(\frac{p}{1-p}\right)=b_0+b_1*x \]

If we solve this equation for the odds expression, we get \[\left(\frac{p}{1-p}\right) = e^{b_0+b_1*x}\] The next step is to solve this for the probability. Doing this we get

\[p = \frac{e^{b_0+b_1*x}}{1+e^{b_0+b_1*x}}\]