Introduction
This project uses logistic regression to predict customer
subscription behavior based on age and gender. By quantifying how these
demographics drive conversion, this analysis identifies high-value
customer segments. The resulting evidence-based insights will guide
managerial decisions on marketing investment and targeting strategy.
Define Outcome and
Select Predictors
- Outcome: Subscribe (Binary, 1=Yes)
- Predictors: Age, Gender
Loading & Reading
Data
# Load the data
library(readxl)
Q4 <- read_excel("Logitsubscribedata.xlsx",
sheet = "data")
# Clean column names for easier R handling
colnames(Q4) <- c("Age", "Gender", "Subscribe")
Q4$Gender <- factor(Q4$Gender, levels = c(0, 1), labels = c("Man", "Woman"))
Q4$Subscribe <- factor(Q4$Subscribe, levels = c(0, 1), labels = c("No", "Yes"))
str(Q4)
## tibble [1,345 Ă— 3] (S3: tbl_df/tbl/data.frame)
## $ Age : num [1:1345] 33 45 57 32 56 60 40 55 27 48 ...
## $ Gender : Factor w/ 2 levels "Man","Woman": 1 2 1 2 1 2 1 1 1 2 ...
## $ Subscribe: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 2 1 1 1 1 ...
Descriptive Statistics
& Summary
Summary
## Age Gender Subscribe
## Min. :20.00 Man :662 No :1025
## 1st Qu.:29.00 Woman:683 Yes: 320
## Median :40.00
## Mean :39.65
## 3rd Qu.:50.00
## Max. :60.00
Subscription
Rate
prop.table(table(Q4$Subscribe))
##
## No Yes
## 0.7620818 0.2379182
Gender
Distribution
prop.table(table(Q4$Gender))
##
## Man Woman
## 0.4921933 0.5078067
Estimate The Logistic
Regression Model
logit_model <- glm(Subscribe ~ Age + Gender,
data = Q4,
family = binomial(link = "logit"))
# Display the summary output
summary(logit_model)
##
## Call:
## glm(formula = Subscribe ~ Age + Gender, family = binomial(link = "logit"),
## data = Q4)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.597628 0.230870 2.589 0.00964 **
## Age -0.052399 0.005895 -8.888 < 2e-16 ***
## GenderWoman 0.407014 0.133725 3.044 0.00234 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1475.9 on 1344 degrees of freedom
## Residual deviance: 1381.1 on 1342 degrees of freedom
## AIC: 1387.1
##
## Number of Fisher Scoring iterations: 4
# 4. Alternative Specification (Age Squared for non-linear effects)
model_sq <- glm(Subscribe ~ Age + I(Age^2) + Gender,
data = Q4,
family = binomial(link = "logit"))
# Compare models using AIC (Lower is better)
cat("AIC Model 1 (Linear Age):", AIC(logit_model), "\n")
## AIC Model 1 (Linear Age): 1387.112
cat("AIC Model 2 (Age Squared):", AIC(model_sq), "\n")
## AIC Model 2 (Age Squared): 1385.573
Bonus Insights
Simple Visualization:
Predicted Probabilities
library(ggplot2)
# Create a data frame for predictions
new_data <- expand.grid(
Age = seq(min(Q4$Age), max(Q4$Age), length.out = 100),
Gender = factor(c("Man", "Woman"), levels = c("Man", "Woman"))
)
# Generate predicted probabilities
new_data$prob <- predict(logit_model, newdata = new_data, type = "response")
# Plotting
ggplot(new_data, aes(x = Age, y = prob, color = Gender)) +
geom_line(size = 1) +
labs(title = "Probability of Subscription vs. Age",
subtitle = "Higher subscription likelihood among younger females",
x = "Age", y = "Predicted Probability") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Segmentation
Insight
# Assign predicted probabilities to the original dataset
Q4$pred_prob <- predict(logit_model, type = "response")
# Identify the 'Top Tier' Segment (Top 25% likely to subscribe)
top_tier <- Q4 %>% filter(pred_prob > quantile(pred_prob, 0.75))
# Summary of the Top Tier
summary(top_tier[, c("Age", "Gender")])
## Age Gender
## Min. :20.00 Man :100
## 1st Qu.:21.25 Woman:226
## Median :24.00
## Mean :24.71
## 3rd Qu.:27.75
## Max. :32.00
Model Limitations and
Bias
While statistically significant, this model has limitations. The low
Pseudo R-squared indicates omitted variable bias, as we lack data on key
drivers like income or brand loyalty. There is also a risk of selection
bias if this sample doesn’t represent the broader market. Finally, these
results show correlation, not causality; for example, high engagement in
younger women might be driven by the specific marketing channels they
use rather than age itself.
LS0tCnRpdGxlOiAiTWlNIDgxMSAtIE1hcmtldGluZyBBbmFseXRpY3MgUXVpeiA0IgpzdWJ0aXRsZTogIkxvZ2lzdGljIFJlZ3Jlc3Npb24gLyBQcmVkaWN0aXZlIE1vZGVsaW5nIgphdXRob3I6ICJOdXIgRHVtYW4iCmRhdGU6ICJKYW51YXJ5IDE3LCAyMDI2IgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIHRoZW1lOiBmbGF0bHkKICAgIGhpZ2hsaWdodDogcHlnbWVudHMKICAgIHRvYzogdHJ1ZQogICAgdG9jX2RlcHRoOiAzCiAgICB0b2NfZmxvYXQ6CiAgICAgIGNvbGxhcHNlZDogdHJ1ZQogICAgICBzbW9vdGhfc2Nyb2xsOiB0cnVlCiAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICAgIGNvZGVfZm9sZGluZzogaGlkZQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQogICAgZGZfcHJpbnQ6IHBhZ2VkCiAgICBzZWxmX2NvbnRhaW5lZDogdHJ1ZQotLS0KCiMgSW50cm9kdWN0aW9uCgpUaGlzIHByb2plY3QgdXNlcyBsb2dpc3RpYyByZWdyZXNzaW9uIHRvIHByZWRpY3QgY3VzdG9tZXIgc3Vic2NyaXB0aW9uIGJlaGF2aW9yIGJhc2VkIG9uIGFnZSBhbmQgZ2VuZGVyLiBCeSBxdWFudGlmeWluZyBob3cgdGhlc2UgZGVtb2dyYXBoaWNzIGRyaXZlIGNvbnZlcnNpb24sIHRoaXMgYW5hbHlzaXMgaWRlbnRpZmllcyBoaWdoLXZhbHVlIGN1c3RvbWVyIHNlZ21lbnRzLiBUaGUgcmVzdWx0aW5nIGV2aWRlbmNlLWJhc2VkIGluc2lnaHRzIHdpbGwgZ3VpZGUgbWFuYWdlcmlhbCBkZWNpc2lvbnMgb24gbWFya2V0aW5nIGludmVzdG1lbnQgYW5kIHRhcmdldGluZyBzdHJhdGVneS4KCiMgRGVmaW5lIE91dGNvbWUgYW5kIFNlbGVjdCBQcmVkaWN0b3JzCgoqICoqT3V0Y29tZToqKiBTdWJzY3JpYmUgKEJpbmFyeSwgMT1ZZXMpICAKKiAqKlByZWRpY3RvcnM6KiogQWdlLCBHZW5kZXIKCiMgTG9hZGluZyAmIFJlYWRpbmcgRGF0YQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkKCiMgTG9hZCBuZWNlc3NhcnkgbGlicmFyeQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KHRpZHl2ZXJzZSkKYGBgCgpgYGB7cn0KIyBMb2FkIHRoZSBkYXRhCmxpYnJhcnkocmVhZHhsKQpRNCA8LSByZWFkX2V4Y2VsKCJMb2dpdHN1YnNjcmliZWRhdGEueGxzeCIsIAogICAgc2hlZXQgPSAiZGF0YSIpCiMgQ2xlYW4gY29sdW1uIG5hbWVzIGZvciBlYXNpZXIgUiBoYW5kbGluZwpjb2xuYW1lcyhRNCkgPC0gYygiQWdlIiwgIkdlbmRlciIsICJTdWJzY3JpYmUiKQpRNCRHZW5kZXIgPC0gZmFjdG9yKFE0JEdlbmRlciwgbGV2ZWxzID0gYygwLCAxKSwgbGFiZWxzID0gYygiTWFuIiwgIldvbWFuIikpClE0JFN1YnNjcmliZSA8LSBmYWN0b3IoUTQkU3Vic2NyaWJlLCBsZXZlbHMgPSBjKDAsIDEpLCBsYWJlbHMgPSBjKCJObyIsICJZZXMiKSkKc3RyKFE0KQpgYGAKIyBEZXNjcmlwdGl2ZSBTdGF0aXN0aWNzICYgU3VtbWFyeQojIyBTdW1tYXJ5CmBgYHtyfQpzdW1tYXJ5KFE0KQpgYGAKCiMjIFN1YnNjcmlwdGlvbiBSYXRlIApgYGB7cn0KcHJvcC50YWJsZSh0YWJsZShRNCRTdWJzY3JpYmUpKQpgYGAKCiMjIEdlbmRlciBEaXN0cmlidXRpb24KYGBge3J9CnByb3AudGFibGUodGFibGUoUTQkR2VuZGVyKSkKYGBgCgojIEVzdGltYXRlIFRoZSBMb2dpc3RpYyBSZWdyZXNzaW9uIE1vZGVsCmBgYHtyfQpsb2dpdF9tb2RlbCA8LSBnbG0oU3Vic2NyaWJlIH4gQWdlICsgR2VuZGVyLCAKICAgICAgICAgICAgICAgICAgIGRhdGEgPSBRNCwgCiAgICAgICAgICAgICAgICAgICBmYW1pbHkgPSBiaW5vbWlhbChsaW5rID0gImxvZ2l0IikpCgojIERpc3BsYXkgdGhlIHN1bW1hcnkgb3V0cHV0CnN1bW1hcnkobG9naXRfbW9kZWwpCgojIDQuIEFsdGVybmF0aXZlIFNwZWNpZmljYXRpb24gKEFnZSBTcXVhcmVkIGZvciBub24tbGluZWFyIGVmZmVjdHMpCm1vZGVsX3NxIDwtIGdsbShTdWJzY3JpYmUgfiBBZ2UgKyBJKEFnZV4yKSArIEdlbmRlciwgCiAgICAgICAgICAgICAgICBkYXRhID0gUTQsIAogICAgICAgICAgICAgICAgZmFtaWx5ID0gYmlub21pYWwobGluayA9ICJsb2dpdCIpKQoKIyBDb21wYXJlIG1vZGVscyB1c2luZyBBSUMgKExvd2VyIGlzIGJldHRlcikKY2F0KCJBSUMgTW9kZWwgMSAoTGluZWFyIEFnZSk6IiwgQUlDKGxvZ2l0X21vZGVsKSwgIlxuIikKY2F0KCJBSUMgTW9kZWwgMiAoQWdlIFNxdWFyZWQpOiIsIEFJQyhtb2RlbF9zcSksICJcbiIpCmBgYAoKIyBCb251cyBJbnNpZ2h0cwoKIyMgU2ltcGxlIFZpc3VhbGl6YXRpb246IFByZWRpY3RlZCBQcm9iYWJpbGl0aWVzCgpgYGB7cn0KbGlicmFyeShnZ3Bsb3QyKQoKIyBDcmVhdGUgYSBkYXRhIGZyYW1lIGZvciBwcmVkaWN0aW9ucwpuZXdfZGF0YSA8LSBleHBhbmQuZ3JpZCgKICBBZ2UgPSBzZXEobWluKFE0JEFnZSksIG1heChRNCRBZ2UpLCBsZW5ndGgub3V0ID0gMTAwKSwKICBHZW5kZXIgPSBmYWN0b3IoYygiTWFuIiwgIldvbWFuIiksIGxldmVscyA9IGMoIk1hbiIsICJXb21hbiIpKQopCgojIEdlbmVyYXRlIHByZWRpY3RlZCBwcm9iYWJpbGl0aWVzCm5ld19kYXRhJHByb2IgPC0gcHJlZGljdChsb2dpdF9tb2RlbCwgbmV3ZGF0YSA9IG5ld19kYXRhLCB0eXBlID0gInJlc3BvbnNlIikKCiMgUGxvdHRpbmcKZ2dwbG90KG5ld19kYXRhLCBhZXMoeCA9IEFnZSwgeSA9IHByb2IsIGNvbG9yID0gR2VuZGVyKSkgKwogIGdlb21fbGluZShzaXplID0gMSkgKwogIGxhYnModGl0bGUgPSAiUHJvYmFiaWxpdHkgb2YgU3Vic2NyaXB0aW9uIHZzLiBBZ2UiLAogICAgICAgc3VidGl0bGUgPSAiSGlnaGVyIHN1YnNjcmlwdGlvbiBsaWtlbGlob29kIGFtb25nIHlvdW5nZXIgZmVtYWxlcyIsCiAgICAgICB4ID0gIkFnZSIsIHkgPSAiUHJlZGljdGVkIFByb2JhYmlsaXR5IikgKwogIHRoZW1lX21pbmltYWwoKQpgYGAKCiMjIFNlZ21lbnRhdGlvbiBJbnNpZ2h0CgpgYGB7cn0KIyBBc3NpZ24gcHJlZGljdGVkIHByb2JhYmlsaXRpZXMgdG8gdGhlIG9yaWdpbmFsIGRhdGFzZXQKUTQkcHJlZF9wcm9iIDwtIHByZWRpY3QobG9naXRfbW9kZWwsIHR5cGUgPSAicmVzcG9uc2UiKQoKIyBJZGVudGlmeSB0aGUgJ1RvcCBUaWVyJyBTZWdtZW50IChUb3AgMjUlIGxpa2VseSB0byBzdWJzY3JpYmUpCnRvcF90aWVyIDwtIFE0ICU+JSBmaWx0ZXIocHJlZF9wcm9iID4gcXVhbnRpbGUocHJlZF9wcm9iLCAwLjc1KSkKCiMgU3VtbWFyeSBvZiB0aGUgVG9wIFRpZXIKc3VtbWFyeSh0b3BfdGllclssIGMoIkFnZSIsICJHZW5kZXIiKV0pCmBgYAoKIyMgTW9kZWwgTGltaXRhdGlvbnMgYW5kIEJpYXMKCldoaWxlIHN0YXRpc3RpY2FsbHkgc2lnbmlmaWNhbnQsIHRoaXMgbW9kZWwgaGFzIGxpbWl0YXRpb25zLiBUaGUgbG93IFBzZXVkbyBSLXNxdWFyZWQgaW5kaWNhdGVzIG9taXR0ZWQgdmFyaWFibGUgYmlhcywgYXMgd2UgbGFjayBkYXRhIG9uIGtleSBkcml2ZXJzIGxpa2UgaW5jb21lIG9yIGJyYW5kIGxveWFsdHkuIFRoZXJlIGlzIGFsc28gYSByaXNrIG9mIHNlbGVjdGlvbiBiaWFzIGlmIHRoaXMgc2FtcGxlIGRvZXNuJ3QgcmVwcmVzZW50IHRoZSBicm9hZGVyIG1hcmtldC4gRmluYWxseSwgdGhlc2UgcmVzdWx0cyBzaG93IGNvcnJlbGF0aW9uLCBub3QgY2F1c2FsaXR5OyBmb3IgZXhhbXBsZSwgaGlnaCBlbmdhZ2VtZW50IGluIHlvdW5nZXIgd29tZW4gbWlnaHQgYmUgZHJpdmVuIGJ5IHRoZSBzcGVjaWZpYyBtYXJrZXRpbmcgY2hhbm5lbHMgdGhleSB1c2UgcmF0aGVyIHRoYW4gYWdlIGl0c2VsZi4K