Linear regression predicts a continuous outcome β GPA, reaction time, income. But what happens when the outcome is binary? Acquitted or not acquitted? Survived or did not survive? Voted or did not vote?
If you tried to use ordinary linear regression on a binary outcome, nothing would stop the model from predicting values like 1.4 or β0.3, which are nonsensical as probabilities. Probabilities must stay between 0 and 1.
Logistic regression solves this by predicting the outcome on a transformed scale: the log odds scale. The transformation that connects a probability to log odds is called the link function, and it is the logistic (sigmoid) function:
\[\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X\]
The left side, \(\log\left(\frac{p}{1-p}\right)\), is the log odds of the outcome. The right side is the familiar linear predictor from regression. The link function is what allows a linear equation (which can produce any value from \(-\infty\) to \(+\infty\)) to map onto probabilities, which are bounded between 0 and 1.
To recover a predicted probability from log odds, you use the inverse of the link function:
\[\hat{p} = \frac{\exp(\beta_0 + \beta_1 X)}{1 + \exp(\beta_0 + \beta_1 X)}\]
LR.1.1
Why canβt ordinary linear regression be used directly to predict a binary outcome?
A. Because binary outcomes have no variance
B. Because predicted values can fall outside the range [0, 1]
C. Because logistic regression always produces better model fit
D. Because binary outcomes have too few observations for regression
LR.1.2
In logistic regression, what does the link function do?
A. It converts a probability into a test statistic
B. It transforms the bounded probability scale into an unbounded linear
scale
C. It standardizes the predictor variable
D. It removes outliers from the model
LR.1.3
The logistic regression equation is:
\[\log\left(\frac{p}{1-p}\right) = -2.1 + 0.008X\]
What does the left-hand side of this equation represent?
A. The predicted probability of the outcome
B. The predicted value of \(X\)
C. The log odds of the outcome
D. The residual error
Suppose a logistic regression model predicting the probability of acquittal based on a defense attorneyβs hourly rate (in dollars) produces the following coefficients:
\[\log\left(\frac{\hat{p}}{1-\hat{p}}\right) = -2.1 + 0.008 \cdot \text{HourlyRate}\]
To get a predicted probability for a specific case, you follow two steps.
Step 1: Plug in \(X\) to get the log odds.
For a defendant whose attorney charges $300/hr:
\[\text{log odds} = -2.1 + 0.008 \times 300 = -2.1 + 2.4 = 0.30\]
Step 2: Convert log odds to a predicted probability.
\[\hat{p} = \frac{1}{1 + \exp(-0.30)} = \frac{1}{1 + 0.741} = \frac{1}{1.741} \approx .57\]
A defendant with an attorney charging $300/hr has a predicted probability of acquittal of approximately .57.
A note on direction of effect
The sign on \(\beta_1\) tells you the direction of the relationship β on the log odds scale. A positive coefficient means that as \(X\) increases, the log odds increase, which means the predicted probability of the outcome also increases. A negative coefficient means the reverse.
In the model above, \(\beta_1 = 0.008\) is positive. This means that higher hourly rates are associated with higher predicted probabilities of acquittal. The direction of the effect is the same whether you are thinking in log odds or in probabilities β a positive coefficient always moves \(\hat{p}\) upward, and a negative coefficient always moves it downward.
LR.2.1
Using the model below, compute the predicted log odds for a defendant whose attorney charges $500/hr.
\[\log\left(\frac{\hat{p}}{1-\hat{p}}\right) = -2.1 + 0.008 \cdot \text{HourlyRate}\]
Round to two decimal places.
LR.2.2
Using your answer from LR.2.1, convert the log odds to a predicted probability. Round to two decimal places.
LR.2.3
A different model is estimated:
\[\log\left(\frac{\hat{p}}{1-\hat{p}}\right) = 1.4 - 0.003 \cdot \text{HourlyRate}\]
Without doing any calculation, what does the negative sign on \(\beta_1\) tell you?
A. The model failed to converge
B. Higher hourly rates are associated with lower predicted probabilities
of acquittal
C. Higher hourly rates are associated with higher predicted
probabilities of acquittal
D. The intercept is meaningless
LR.2.4
Using the original model, a researcher plugs in \(X = 0\) (an attorney who charges $0/hr) and gets a log odds of \(-2.1\). They convert this to a predicted probability of approximately .11.
Which of the following is the most accurate interpretation?
A. The model predicts that 11% of all defendants are acquitted
B. A defendant with a $0/hr attorney has a predicted probability of
acquittal of .11 according to this model
C. The intercept is statistically significant at \(p < .05\)
D. The model is invalid because no attorney charges $0/hr