library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Probability and Odds

Two different ways of describing the same thing.
prababilities are in the range \(0\) to \(1\).
Odds are transformed probabilities

\[ Odds = \frac{p}{1-p} \]

If \(p = 0\), \(Odds = 0\).
If \(p\) approaches \(1\), \(Odds\) approaches infinity.
\(Odds\) has values in the range \(0\) to \(\inf\).
Reversing the relationship above, we get

\[p = \frac{Odds}{1+Odds}\]

Graphical View

p = seq(from=.01,to=.999,by=.001)
odds = p/(1-p)
df = data.frame(p,odds)
df$logodds = log(df$odds)

df %>% ggplot(aes(p,odds)) + geom_point()

df %>% ggplot(aes(odds,p)) + geom_point()

df %>% ggplot(aes(p,logodds)) + geom_point()

Model Relationships.

The linear probability model may be stated as

\[p = b_0+b_1*x \]

The linear probability can produce probabilities outside the range of zero to one, so we assume the logistic regression model instead.

The logistic regression model places the log of the odds on the left hand side.

\[log\left(\frac{p}{1-p}\right)=b_0+b_1*x \]

If we solve this equation for the odds expression, we get \[\left(\frac{p}{1-p}\right) = e^{b_0+b_1*x}\] The next step is to solve this for the probability. Doing this we get

\[p = \frac{e^{b_0+b_1*x}}{1+e^{b_0+b_1*x}}\]

Probability, Odds & Logistic Regression

Probability and Odds

Graphical View

Model Relationships.