Rather than predicting \(p(y_i=1)\) directly, we must transform it into an unbounded variable with a link function:
\[ OR = \frac{p}{1-p} \]
#scenario one
p = 0.9 #probability of YES or 1
q = 1-p #probability of NO or 0
OR_1 = p/q #YES
OR_1
## [1] 9
OR_0 = q/p #NO
OR_0
## [1] 0.1111111
When \(p(y_i = 1) = 0.9\), the OR(1) = 9; and \(p(y_i = 0) = 0.1\), the OR(0) = 0.11.
#scenario two
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0
OR_1 = p/q #YES
OR_1
## [1] 1.5
OR_0 = q/p #NO
OR_0
## [1] 0.6666667
When \(p(y_i = 1) = 0.6\), the OR(1) = 1.5; and \(p(y_i = 0) = 0.4\), the OR(0) = 0.67.
#scenario three
p = 0.3 #probability of YES or 1
q = 1-p #probability of NO or 0
OR_1 = p/q #YES
OR_1
## [1] 0.4285714
OR_0 = q/p #NO
OR_0
## [1] 2.333333
When \(p(y_i = 1) = 0.3\), the OR(1) = 0.43; and \(p(y_i = 0) = 0.7\), the OR(0) = 2.33.
Comments: odds scale is skewed, asymmetric, and ranges from 0 to \(+\infty\) Not helpful
\[ e^{ln_x} = x \] \(\ln\) = natural logarithm \(e\) = natural exponent \(x\) = real number
The natural log is the logarithm to the base of the number \(e\) and is the inverse function of an exponential function. Natural logarithms are special types of logarithms and are used in solving time and growth problems. Logarithmic functions and exponential functions are the foundations of logarithms and natural logs.
#scenario One
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0
OR_1 = p/q #YES
OR_1
## [1] 1.5
OR_0 = q/p #NO
OR_0
## [1] 0.6666667
logit_1 = log(p/(1-p)) #natural log
logit_1
## [1] 0.4054651
logit_0 = log(q/(1-q)) #natural log
logit_0
## [1] -0.4054651
If \(p(y_1=1)= 0.6\), then Logit(1)=0.405; Logit(0)= -0.405.
Comments: Logit scale is now symmetric about 0, range is \(±\infty\).
A Logit link is a nonlinear transformation of probability:
Equal intervals in logits are NOT equal intervals of probability
The logit goes from \(±\infty\) and is symmetric about prob = .5 (logit = 0)
Now we can use a linear model. The model will be linear with respect to the predicted logit, which translates into a nonlinear prediction with respect to probability, the conditional mean outcome shuts off at 0 or 1 as needed.
\[ p(y_i = 1) = \beta_0 + \beta_1X_i + \beta_2Z_i + e_i \] If \(y_i\) is binary, \(e_i\) can only be 2 things: \(e_i = y_i - \hat{y_i}\)
if \(y_i = 1\), \(e_i = (1 - \hat{p})\)
variance of binary variable: \[ Var(y_i) = p \times (1-p) \] ## Logistic Regression
The \(g(.)\space link\) function:
\[ log\frac{p(y_i =1)}{1-p(y_i = 1)} = \beta_0 + \beta_1X_i + \beta_2Z_i + e_i \] \[ e^{ln_x} = x \] \[ e^{log\frac{p(y_i =1)}{1-p(y_i = 1)} } = \frac{p(y_i =1)}{1-p(y_i = 1)}= exp^{\beta_0 + \beta_1X_i + \beta_2Z_i + e_i} \]
#scenario One
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0
logit_1 = log(p/(1-p)) #natural log
logit_1
## [1] 0.4054651
logit_0 = log(q/(1-q)) #natural log
logit_0
## [1] -0.4054651
Comments: symmetric, unbounded, \(-\infty \sim \infty\)
\[ \frac{p(y_i =1)}{1-p(y_i = 1)} = exp^{\beta_0 + \beta_1X_i + \beta_2Z_i + e_i} \]
#scenario One
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0
OR_1 = p/q #YES
OR_1
## [1] 1.5
OR_0 = q/p #NO
OR_0
## [1] 0.6666667
Comments: non-symmetric, one-side bounded, 0 to \(\infty\).
The \(g^{-1}(.)\space inverse \space link\) function:
\[ p(y_i =1) = \frac{exp^{\beta_0 + \beta_1X_i + \beta_2Z_i + e_i}}{1+exp^{\beta_0 + \beta_1X_i + \beta_2Z_i + e_i}} \]
#scenario One
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0
logit_1 = log(p/(1-p)) #natural log
logit_1
## [1] 0.4054651
logit_0 = log(q/(1-q)) #natural log
logit_0
## [1] -0.4054651
OR_1 = p/q #YES
OR_1
## [1] 1.5
OR_0 = q/p #NO
OR_0
## [1] 0.6666667
prob_1 = exp(logit_1)/(1+exp(logit_1))
prob_1
## [1] 0.6
prob_0 = exp(logit_0)/(1+exp(logit_0))
prob_0
## [1] 0.4
Comments: bounded, \(0 \sim 1\)
\[ log\frac{p(y_i = 0)}{1-p(y_i = 0)} = \beta_{01}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]
\[ log\frac{p(y_i = 1)}{1-p(y_i = 1)} = \beta_{02}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]
\[ log\frac{p(y_i = 2)}{1-p(y_i = 2)} = \beta_{03}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]
\[ log\frac{p(y_i > 0)}{1-p(y_i > 0)} = \beta_{01}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]
\[ log\frac{p(y_i > 1)}{1-p(y_i > 1)} = \beta_{02}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]
\[ log\frac{p(y_i > 2)}{1-p(y_i > 2)} = \beta_{03}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]