1 Introduction

This project uses logistic regression to predict customer subscription behavior based on age and gender. By quantifying how these demographics drive conversion, this analysis identifies high-value customer segments. The resulting evidence-based insights will guide managerial decisions on marketing investment and targeting strategy.

2 Define Outcome and Select Predictors

  • Outcome: Subscribe (Binary, 1=Yes)
  • Predictors: Age, Gender

3 Loading & Reading Data

# Load the data
library(readxl)
Q4 <- read_excel("Logitsubscribedata.xlsx", 
    sheet = "data")
# Clean column names for easier R handling
colnames(Q4) <- c("Age", "Gender", "Subscribe")
Q4$Gender <- factor(Q4$Gender, levels = c(0, 1), labels = c("Man", "Woman"))
Q4$Subscribe <- factor(Q4$Subscribe, levels = c(0, 1), labels = c("No", "Yes"))
str(Q4)
## tibble [1,345 Ă— 3] (S3: tbl_df/tbl/data.frame)
##  $ Age      : num [1:1345] 33 45 57 32 56 60 40 55 27 48 ...
##  $ Gender   : Factor w/ 2 levels "Man","Woman": 1 2 1 2 1 2 1 1 1 2 ...
##  $ Subscribe: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...

4 Descriptive Statistics & Summary

4.1 Summary

summary(Q4)
##       Age          Gender    Subscribe 
##  Min.   :20.00   Man  :662   No :1025  
##  1st Qu.:29.00   Woman:683   Yes: 320  
##  Median :40.00                         
##  Mean   :39.65                         
##  3rd Qu.:50.00                         
##  Max.   :60.00

4.2 Subscription Rate

prop.table(table(Q4$Subscribe))
## 
##        No       Yes 
## 0.7620818 0.2379182

4.3 Gender Distribution

prop.table(table(Q4$Gender))
## 
##       Man     Woman 
## 0.4921933 0.5078067

5 Estimate The Logistic Regression Model

logit_model <- glm(Subscribe ~ Age + Gender, 
                   data = Q4, 
                   family = binomial(link = "logit"))

# Display the summary output
summary(logit_model)
## 
## Call:
## glm(formula = Subscribe ~ Age + Gender, family = binomial(link = "logit"), 
##     data = Q4)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.597628   0.230870   2.589  0.00964 ** 
## Age         -0.052399   0.005895  -8.888  < 2e-16 ***
## GenderWoman  0.407014   0.133725   3.044  0.00234 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1475.9  on 1344  degrees of freedom
## Residual deviance: 1381.1  on 1342  degrees of freedom
## AIC: 1387.1
## 
## Number of Fisher Scoring iterations: 4
# 4. Alternative Specification (Age Squared for non-linear effects)
model_sq <- glm(Subscribe ~ Age + I(Age^2) + Gender, 
                data = Q4, 
                family = binomial(link = "logit"))

# Compare models using AIC (Lower is better)
cat("AIC Model 1 (Linear Age):", AIC(logit_model), "\n")
## AIC Model 1 (Linear Age): 1387.112
cat("AIC Model 2 (Age Squared):", AIC(model_sq), "\n")
## AIC Model 2 (Age Squared): 1385.573

6 Bonus Insights

6.1 Simple Visualization: Predicted Probabilities

library(ggplot2)

# Create a data frame for predictions
new_data <- expand.grid(
  Age = seq(min(Q4$Age), max(Q4$Age), length.out = 100),
  Gender = factor(c("Man", "Woman"), levels = c("Man", "Woman"))
)

# Generate predicted probabilities
new_data$prob <- predict(logit_model, newdata = new_data, type = "response")

# Plotting
ggplot(new_data, aes(x = Age, y = prob, color = Gender)) +
  geom_line(size = 1) +
  labs(title = "Probability of Subscription vs. Age",
       subtitle = "Higher subscription likelihood among younger females",
       x = "Age", y = "Predicted Probability") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

6.2 Segmentation Insight

# Assign predicted probabilities to the original dataset
Q4$pred_prob <- predict(logit_model, type = "response")

# Identify the 'Top Tier' Segment (Top 25% likely to subscribe)
top_tier <- Q4 %>% filter(pred_prob > quantile(pred_prob, 0.75))

# Summary of the Top Tier
summary(top_tier[, c("Age", "Gender")])
##       Age          Gender   
##  Min.   :20.00   Man  :100  
##  1st Qu.:21.25   Woman:226  
##  Median :24.00              
##  Mean   :24.71              
##  3rd Qu.:27.75              
##  Max.   :32.00

6.3 Model Limitations and Bias

While statistically significant, this model has limitations. The low Pseudo R-squared indicates omitted variable bias, as we lack data on key drivers like income or brand loyalty. There is also a risk of selection bias if this sample doesn’t represent the broader market. Finally, these results show correlation, not causality; for example, high engagement in younger women might be driven by the specific marketing channels they use rather than age itself.

LS0tCnRpdGxlOiAiTWlNIDgxMSAtIE1hcmtldGluZyBBbmFseXRpY3MgUXVpeiA0IgpzdWJ0aXRsZTogIkxvZ2lzdGljIFJlZ3Jlc3Npb24gLyBQcmVkaWN0aXZlIE1vZGVsaW5nIgphdXRob3I6ICJOdXIgRHVtYW4iCmRhdGU6ICJKYW51YXJ5IDE3LCAyMDI2IgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIHRoZW1lOiBmbGF0bHkKICAgIGhpZ2hsaWdodDogcHlnbWVudHMKICAgIHRvYzogdHJ1ZQogICAgdG9jX2RlcHRoOiAzCiAgICB0b2NfZmxvYXQ6CiAgICAgIGNvbGxhcHNlZDogdHJ1ZQogICAgICBzbW9vdGhfc2Nyb2xsOiB0cnVlCiAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICAgIGNvZGVfZm9sZGluZzogaGlkZQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQogICAgZGZfcHJpbnQ6IHBhZ2VkCiAgICBzZWxmX2NvbnRhaW5lZDogdHJ1ZQotLS0KCiMgSW50cm9kdWN0aW9uCgpUaGlzIHByb2plY3QgdXNlcyBsb2dpc3RpYyByZWdyZXNzaW9uIHRvIHByZWRpY3QgY3VzdG9tZXIgc3Vic2NyaXB0aW9uIGJlaGF2aW9yIGJhc2VkIG9uIGFnZSBhbmQgZ2VuZGVyLiBCeSBxdWFudGlmeWluZyBob3cgdGhlc2UgZGVtb2dyYXBoaWNzIGRyaXZlIGNvbnZlcnNpb24sIHRoaXMgYW5hbHlzaXMgaWRlbnRpZmllcyBoaWdoLXZhbHVlIGN1c3RvbWVyIHNlZ21lbnRzLiBUaGUgcmVzdWx0aW5nIGV2aWRlbmNlLWJhc2VkIGluc2lnaHRzIHdpbGwgZ3VpZGUgbWFuYWdlcmlhbCBkZWNpc2lvbnMgb24gbWFya2V0aW5nIGludmVzdG1lbnQgYW5kIHRhcmdldGluZyBzdHJhdGVneS4KCiMgRGVmaW5lIE91dGNvbWUgYW5kIFNlbGVjdCBQcmVkaWN0b3JzCgoqICoqT3V0Y29tZToqKiBTdWJzY3JpYmUgKEJpbmFyeSwgMT1ZZXMpICAKKiAqKlByZWRpY3RvcnM6KiogQWdlLCBHZW5kZXIKCiMgTG9hZGluZyAmIFJlYWRpbmcgRGF0YQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKCiMgTG9hZCBuZWNlc3NhcnkgbGlicmFyeQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KHRpZHl2ZXJzZSkKYGBgCgpgYGB7cn0KIyBMb2FkIHRoZSBkYXRhCmxpYnJhcnkocmVhZHhsKQpRNCA8LSByZWFkX2V4Y2VsKCJMb2dpdHN1YnNjcmliZWRhdGEueGxzeCIsIAogICAgc2hlZXQgPSAiZGF0YSIpCiMgQ2xlYW4gY29sdW1uIG5hbWVzIGZvciBlYXNpZXIgUiBoYW5kbGluZwpjb2xuYW1lcyhRNCkgPC0gYygiQWdlIiwgIkdlbmRlciIsICJTdWJzY3JpYmUiKQpRNCRHZW5kZXIgPC0gZmFjdG9yKFE0JEdlbmRlciwgbGV2ZWxzID0gYygwLCAxKSwgbGFiZWxzID0gYygiTWFuIiwgIldvbWFuIikpClE0JFN1YnNjcmliZSA8LSBmYWN0b3IoUTQkU3Vic2NyaWJlLCBsZXZlbHMgPSBjKDAsIDEpLCBsYWJlbHMgPSBjKCJObyIsICJZZXMiKSkKc3RyKFE0KQpgYGAKIyBEZXNjcmlwdGl2ZSBTdGF0aXN0aWNzICYgU3VtbWFyeQojIyBTdW1tYXJ5CmBgYHtyfQpzdW1tYXJ5KFE0KQpgYGAKCiMjIFN1YnNjcmlwdGlvbiBSYXRlIApgYGB7cn0KcHJvcC50YWJsZSh0YWJsZShRNCRTdWJzY3JpYmUpKQpgYGAKCiMjIEdlbmRlciBEaXN0cmlidXRpb24KYGBge3J9CnByb3AudGFibGUodGFibGUoUTQkR2VuZGVyKSkKYGBgCgojIEVzdGltYXRlIFRoZSBMb2dpc3RpYyBSZWdyZXNzaW9uIE1vZGVsCmBgYHtyfQpsb2dpdF9tb2RlbCA8LSBnbG0oU3Vic2NyaWJlIH4gQWdlICsgR2VuZGVyLCAKICAgICAgICAgICAgICAgICAgIGRhdGEgPSBRNCwgCiAgICAgICAgICAgICAgICAgICBmYW1pbHkgPSBiaW5vbWlhbChsaW5rID0gImxvZ2l0IikpCgojIERpc3BsYXkgdGhlIHN1bW1hcnkgb3V0cHV0CnN1bW1hcnkobG9naXRfbW9kZWwpCgojIDQuIEFsdGVybmF0aXZlIFNwZWNpZmljYXRpb24gKEFnZSBTcXVhcmVkIGZvciBub24tbGluZWFyIGVmZmVjdHMpCm1vZGVsX3NxIDwtIGdsbShTdWJzY3JpYmUgfiBBZ2UgKyBJKEFnZV4yKSArIEdlbmRlciwgCiAgICAgICAgICAgICAgICBkYXRhID0gUTQsIAogICAgICAgICAgICAgICAgZmFtaWx5ID0gYmlub21pYWwobGluayA9ICJsb2dpdCIpKQoKIyBDb21wYXJlIG1vZGVscyB1c2luZyBBSUMgKExvd2VyIGlzIGJldHRlcikKY2F0KCJBSUMgTW9kZWwgMSAoTGluZWFyIEFnZSk6IiwgQUlDKGxvZ2l0X21vZGVsKSwgIlxuIikKY2F0KCJBSUMgTW9kZWwgMiAoQWdlIFNxdWFyZWQpOiIsIEFJQyhtb2RlbF9zcSksICJcbiIpCmBgYAoKIyBCb251cyBJbnNpZ2h0cwoKIyMgU2ltcGxlIFZpc3VhbGl6YXRpb246IFByZWRpY3RlZCBQcm9iYWJpbGl0aWVzCgpgYGB7cn0KbGlicmFyeShnZ3Bsb3QyKQoKIyBDcmVhdGUgYSBkYXRhIGZyYW1lIGZvciBwcmVkaWN0aW9ucwpuZXdfZGF0YSA8LSBleHBhbmQuZ3JpZCgKICBBZ2UgPSBzZXEobWluKFE0JEFnZSksIG1heChRNCRBZ2UpLCBsZW5ndGgub3V0ID0gMTAwKSwKICBHZW5kZXIgPSBmYWN0b3IoYygiTWFuIiwgIldvbWFuIiksIGxldmVscyA9IGMoIk1hbiIsICJXb21hbiIpKQopCgojIEdlbmVyYXRlIHByZWRpY3RlZCBwcm9iYWJpbGl0aWVzCm5ld19kYXRhJHByb2IgPC0gcHJlZGljdChsb2dpdF9tb2RlbCwgbmV3ZGF0YSA9IG5ld19kYXRhLCB0eXBlID0gInJlc3BvbnNlIikKCiMgUGxvdHRpbmcKZ2dwbG90KG5ld19kYXRhLCBhZXMoeCA9IEFnZSwgeSA9IHByb2IsIGNvbG9yID0gR2VuZGVyKSkgKwogIGdlb21fbGluZShzaXplID0gMSkgKwogIGxhYnModGl0bGUgPSAiUHJvYmFiaWxpdHkgb2YgU3Vic2NyaXB0aW9uIHZzLiBBZ2UiLAogICAgICAgc3VidGl0bGUgPSAiSGlnaGVyIHN1YnNjcmlwdGlvbiBsaWtlbGlob29kIGFtb25nIHlvdW5nZXIgZmVtYWxlcyIsCiAgICAgICB4ID0gIkFnZSIsIHkgPSAiUHJlZGljdGVkIFByb2JhYmlsaXR5IikgKwogIHRoZW1lX21pbmltYWwoKQpgYGAKCiMjIFNlZ21lbnRhdGlvbiBJbnNpZ2h0CgpgYGB7cn0KIyBBc3NpZ24gcHJlZGljdGVkIHByb2JhYmlsaXRpZXMgdG8gdGhlIG9yaWdpbmFsIGRhdGFzZXQKUTQkcHJlZF9wcm9iIDwtIHByZWRpY3QobG9naXRfbW9kZWwsIHR5cGUgPSAicmVzcG9uc2UiKQoKIyBJZGVudGlmeSB0aGUgJ1RvcCBUaWVyJyBTZWdtZW50IChUb3AgMjUlIGxpa2VseSB0byBzdWJzY3JpYmUpCnRvcF90aWVyIDwtIFE0ICU+JSBmaWx0ZXIocHJlZF9wcm9iID4gcXVhbnRpbGUocHJlZF9wcm9iLCAwLjc1KSkKCiMgU3VtbWFyeSBvZiB0aGUgVG9wIFRpZXIKc3VtbWFyeSh0b3BfdGllclssIGMoIkFnZSIsICJHZW5kZXIiKV0pCmBgYAoKIyMgTW9kZWwgTGltaXRhdGlvbnMgYW5kIEJpYXMKCldoaWxlIHN0YXRpc3RpY2FsbHkgc2lnbmlmaWNhbnQsIHRoaXMgbW9kZWwgaGFzIGxpbWl0YXRpb25zLiBUaGUgbG93IFBzZXVkbyBSLXNxdWFyZWQgaW5kaWNhdGVzIG9taXR0ZWQgdmFyaWFibGUgYmlhcywgYXMgd2UgbGFjayBkYXRhIG9uIGtleSBkcml2ZXJzIGxpa2UgaW5jb21lIG9yIGJyYW5kIGxveWFsdHkuIFRoZXJlIGlzIGFsc28gYSByaXNrIG9mIHNlbGVjdGlvbiBiaWFzIGlmIHRoaXMgc2FtcGxlIGRvZXNuJ3QgcmVwcmVzZW50IHRoZSBicm9hZGVyIG1hcmtldC4gRmluYWxseSwgdGhlc2UgcmVzdWx0cyBzaG93IGNvcnJlbGF0aW9uLCBub3QgY2F1c2FsaXR5OyBmb3IgZXhhbXBsZSwgaGlnaCBlbmdhZ2VtZW50IGluIHlvdW5nZXIgd29tZW4gbWlnaHQgYmUgZHJpdmVuIGJ5IHRoZSBzcGVjaWZpYyBtYXJrZXRpbmcgY2hhbm5lbHMgdGhleSB1c2UgcmF0aGVyIHRoYW4gYWdlIGl0c2VsZi4K