Replace “Your Name” with your actual name.

Instructions

In this lab, you will apply logistic regression to a new dataset to predict binary outcomes. You will fit logistic regression models, interpret the coefficients and odds ratios, visualize the results, and assess the model’s fit. You will also explore how different predictors impact the probability of the outcome.

  1. Fit a Logistic Regression Model: Use the provided dataset to predict the binary outcome.
  2. Interpret the Coefficients: Interpret the logistic regression coefficients by exponentiating them to obtain odds ratios.
  3. Visualize the Results: Create visualizations to display the relationship between the predictors and the outcome.
  4. Assess Model Fit: Use the ROC curve to assess the fit of your model.
  5. Submit Your Work: Make sure to knit your .Rmd file to HTML before submitting.

Exercise 1: Fitting a Logistic Regression Model

Dataset: You will predict whether a customer will buy a product (Purchase, 0 = No, 1 = Yes) based on predictors Income, Age, and MaritalStatus.

# Load necessary packages
library(dplyr)
library(broom)
# Generate a new dataset
set.seed(123)
customer_data <- data.frame(
  Purchase = rbinom(1000, 1, prob = 0.5),
  Income = round(rnorm(1000, mean = 50000, sd = 20000)),
  Age = sample(18:70, 1000, replace = TRUE),
  MaritalStatus = factor(sample(c("Single", "Married"), 1000, replace = TRUE))
)

customer_data$Purchase[customer_data$Income > 60000] <- rbinom(sum(customer_data$Income > 60000), 1, prob = 0.85)
customer_data$Purchase[customer_data$Age < 40] <- rbinom(sum(customer_data$Age < 40), 1, prob = 0.88)
customer_data$Purchase[customer_data$MaritalStatus == "Married"] <- rbinom(sum(customer_data$MaritalStatus == "Married"), 1, prob = 0.8)
# Fit the logistic regression model
# Name it model.1
model.1 <- glm(Purchase ~ Income + Age + MaritalStatus, data = customer_data, family = "binomial")
summary(model.1)
## 
## Call:
## glm(formula = Purchase ~ Income + Age + MaritalStatus, family = "binomial", 
##     data = customer_data)
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          1.458e+00  3.065e-01   4.756 1.97e-06 ***
## Income               1.136e-05  3.834e-06   2.963  0.00305 ** 
## Age                 -1.335e-02  5.072e-03  -2.633  0.00847 ** 
## MaritalStatusSingle -4.331e-01  1.520e-01  -2.849  0.00438 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1081.0  on 999  degrees of freedom
## Residual deviance: 1057.6  on 996  degrees of freedom
## AIC: 1065.6
## 
## Number of Fisher Scoring iterations: 4

Exercise 2: Interpreting the Coefficients and Odds Ratios

Task:

  1. Exponentiate the coefficients from the logistic regression model to obtain odds ratios.

  2. Interpret the odds ratios. What does an odds ratio greater than 1 signify for each predictor? What about less than 1?

# Exponentiate the coefficients to obtain odds ratios
odds_ratios <- exp(coef(model.1))
conf_int <- exp(confint(model.1))
## Waiting for profiling to be done...
# Display odds ratios and confidence intervals
tidy_model <- broom::tidy(model.1, exponentiate = TRUE, conf.int = TRUE)
tidy_model
## # A tibble: 4 × 7
##   term                estimate  std.error statistic   p.value conf.low conf.high
##   <chr>                  <dbl>      <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
## 1 (Intercept)            4.30  0.306           4.76   1.97e-6    2.37      7.89 
## 2 Income                 1.00  0.00000383      2.96   3.05e-3    1.00      1.00 
## 3 Age                    0.987 0.00507        -2.63   8.47e-3    0.977     0.997
## 4 MaritalStatusSingle    0.649 0.152          -2.85   4.38e-3    0.481     0.873
  • Income: A one-unit increase in Income slightly increases the odds of Purchase. If the odds ratio > 1, higher income means higher likelihood of purchase.
  • Age: A one-year increase in Age slightly changes the odds of purchasing. If odds ratio < 1, older customers are less likely to purchase.
  • MaritalStatus (Married vs. Single): Being married increases the odds of purchase compared to being single if odds ratio > 1.

Exercise 3: Visualizing the Logistic Regression Results

Task:

  1. Create a plot to visualize the predicted probability of Purchase based on Income.

  2. Use ggplot2 to generate the logistic regression curve.

# Load ggplot2
library(ggplot2)

# Generate predicted probabilities
customer_data$predicted <- predict(model.1, type = "response")

# Plot predicted probability vs Income
ggplot(customer_data, aes(x = Income, y = predicted)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE) +
  labs(title = "Predicted Probability of Purchase vs. Income",
       x = "Income",
       y = "Predicted Probability of Purchase")

Submission Instructions:

Ensure to knit your document to HTML format, checking that all content is correctly displayed before submission.