Logistic Regression is a statistical method for modeling the probability of a binary outcome based on one or more predictor variables.
Mathematically: \[ P(y = 1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}} \]
2024-11-15
Logistic Regression is a statistical method for modeling the probability of a binary outcome based on one or more predictor variables.
Mathematically: \[ P(y = 1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}} \]
We will use the mtcars dataset and modify it for a binary classification problem.
# Create binary outcome: mpg greater than median mtcars$mpg_binary <- ifelse(mtcars$mpg > median(mtcars$mpg), 1, 0) summary(mtcars$mpg_binary)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.0000 0.0000 0.0000 0.4688 1.0000 1.0000
# Scatterplot with logistic fit
ggplot(mtcars, aes(x = wt, y = mpg_binary)) +
geom_jitter(width = 0.1, height = 0.1, color = "darkblue", size = 3) +
stat_smooth(method = "glm", method.args = list(family = "binomial"), color = "blue") +
labs(title = "Scatterplot of Weight vs Binary MPG",
x = "Weight of the car",
y = "Probability of High MPG") +
theme(plot.title = element_text(size = 16),
axis.title = element_text(size = 14),
axis.text = element_text(size = 12))
We fit a logistic regression model to predict the binary outcome (mpg_binary) using weight (wt).
The model is expressed as: \[ P(y = 1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}} \]
Where: - \(P(y = 1|X)\): Probability of a high MPG. - \(\beta_0\): Intercept term. - \(\beta_1\): Coefficient for wt (weight of the car). - \(X\): Predictor variable (weight of the car).
For every unit increase in weight, the odds ratio changes by: \[ e^{\beta_1} \]
## ## Call: ## glm(formula = mpg_binary ~ wt, family = "binomial", data = mtcars) ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 65.81 44.45 1.48 0.139 ## wt -20.35 13.85 -1.47 0.142 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 44.2363 on 31 degrees of freedom ## Residual deviance: 5.2953 on 30 degrees of freedom ## AIC: 9.2953 ## ## Number of Fisher Scoring iterations: 10
Key Insights:
Weight (wt) has a strong negative relationship with the probability of achieving high miles per gallon (mpg_binary). Logistic regression provides a robust method for modeling binary outcomes.
Next Steps:
Add more predictors (e.g., horsepower, number of cylinders) for a multiple logistic regression analysis. Validate the model using cross-validation to assess performance on unseen data.