Generalized Models for Binary Outcomes

Rather than predicting \(p(y_i=1)\) directly, we must transform it into an unbounded variable with a link function:

\[ OR = \frac{p}{1-p} \]

#scenario one 
p = 0.9 #probability of YES or 1
q = 1-p #probability of NO or 0

OR_1 = p/q #YES
OR_1
## [1] 9
OR_0 = q/p #NO
OR_0
## [1] 0.1111111

When \(p(y_i = 1) = 0.9\), the OR(1) = 9; and \(p(y_i = 0) = 0.1\), the OR(0) = 0.11.

#scenario two
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0

OR_1 = p/q #YES
OR_1
## [1] 1.5
OR_0 = q/p #NO
OR_0
## [1] 0.6666667

When \(p(y_i = 1) = 0.6\), the OR(1) = 1.5; and \(p(y_i = 0) = 0.4\), the OR(0) = 0.67.

#scenario three
p = 0.3 #probability of YES or 1
q = 1-p #probability of NO or 0

OR_1 = p/q #YES
OR_1
## [1] 0.4285714
OR_0 = q/p #NO
OR_0
## [1] 2.333333

When \(p(y_i = 1) = 0.3\), the OR(1) = 0.43; and \(p(y_i = 0) = 0.7\), the OR(0) = 2.33.

Comments: odds scale is skewed, asymmetric, and ranges from 0 to \(+\infty\) Not helpful

\[ e^{ln_x} = x \] \(\ln\) = natural logarithm \(e\) = natural exponent \(x\) = real number

The natural log is the logarithm to the base of the number \(e\) and is the inverse function of an exponential function. Natural logarithms are special types of logarithms and are used in solving time and growth problems. Logarithmic functions and exponential functions are the foundations of logarithms and natural logs.

#scenario One
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0

OR_1 = p/q #YES
OR_1
## [1] 1.5
OR_0 = q/p #NO
OR_0
## [1] 0.6666667
logit_1 = log(p/(1-p)) #natural log
logit_1
## [1] 0.4054651
logit_0 = log(q/(1-q)) #natural log
logit_0
## [1] -0.4054651

If \(p(y_1=1)= 0.6\), then Logit(1)=0.405; Logit(0)= -0.405.

Comments: Logit scale is now symmetric about 0, range is \(±\infty\).

A Logit link is a nonlinear transformation of probability:

Now we can use a linear model. The model will be linear with respect to the predicted logit, which translates into a nonlinear prediction with respect to probability, the conditional mean outcome shuts off at 0 or 1 as needed.

Predicted Binary Outcomes

General Linear Model

\[ p(y_i = 1) = \beta_0 + \beta_1X_i + \beta_2Z_i + e_i \] If \(y_i\) is binary, \(e_i\) can only be 2 things: \(e_i = y_i - \hat{y_i}\)

  • if \(y_i = 0\), \(e_i = (0 - \hat{p})\)
  • if \(y_i = 1\), \(e_i = (1 - \hat{p})\)

  • variance of binary variable: \[ Var(y_i) = p \times (1-p) \] ## Logistic Regression

1. Logit: Data to model

The \(g(.)\space link\) function:

\[ log\frac{p(y_i =1)}{1-p(y_i = 1)} = \beta_0 + \beta_1X_i + \beta_2Z_i + e_i \] \[ e^{ln_x} = x \] \[ e^{log\frac{p(y_i =1)}{1-p(y_i = 1)} } = \frac{p(y_i =1)}{1-p(y_i = 1)}= exp^{\beta_0 + \beta_1X_i + \beta_2Z_i + e_i} \]

#scenario One
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0

logit_1 = log(p/(1-p)) #natural log
logit_1
## [1] 0.4054651
logit_0 = log(q/(1-q)) #natural log
logit_0
## [1] -0.4054651

Comments: symmetric, unbounded, \(-\infty \sim \infty\)

2. Oddes ratio:

\[ \frac{p(y_i =1)}{1-p(y_i = 1)} = exp^{\beta_0 + \beta_1X_i + \beta_2Z_i + e_i} \]

#scenario One
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0

OR_1 = p/q #YES
OR_1
## [1] 1.5
OR_0 = q/p #NO
OR_0
## [1] 0.6666667

Comments: non-symmetric, one-side bounded, 0 to \(\infty\).

3. Probability: model to data

The \(g^{-1}(.)\space inverse \space link\) function:

\[ p(y_i =1) = \frac{exp^{\beta_0 + \beta_1X_i + \beta_2Z_i + e_i}}{1+exp^{\beta_0 + \beta_1X_i + \beta_2Z_i + e_i}} \]

#scenario One
p = 0.6 #probability of YES or 1
q = 1-p #probability of NO or 0

logit_1 = log(p/(1-p)) #natural log
logit_1
## [1] 0.4054651
logit_0 = log(q/(1-q)) #natural log
logit_0
## [1] -0.4054651
OR_1 = p/q #YES
OR_1
## [1] 1.5
OR_0 = q/p #NO
OR_0
## [1] 0.6666667
prob_1 = exp(logit_1)/(1+exp(logit_1))
prob_1
## [1] 0.6
prob_0 = exp(logit_0)/(1+exp(logit_0))
prob_0
## [1] 0.4

Comments: bounded, \(0 \sim 1\)

Logit-based Models for C Nominal Categories

\[ log\frac{p(y_i = 0)}{1-p(y_i = 0)} = \beta_{01}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]

\[ log\frac{p(y_i = 1)}{1-p(y_i = 1)} = \beta_{02}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]

\[ log\frac{p(y_i = 2)}{1-p(y_i = 2)} = \beta_{03}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]

Logit-based Models for C Ordinal Categories

\[ log\frac{p(y_i > 0)}{1-p(y_i > 0)} = \beta_{01}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]

\[ log\frac{p(y_i > 1)}{1-p(y_i > 1)} = \beta_{02}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]

\[ log\frac{p(y_i > 2)}{1-p(y_i > 2)} = \beta_{03}+ \beta_1X_i + \beta_2Z_i + \beta_3X_iZ_i + e_i \]