Business Problem

Companies often spend significant resources on marketing campaigns without fully understanding which customers are most likely to respond. Sending the same promotions to all customers can result in low response rates and inefficient use of marketing budgets.

This project seeks to identify customer segments based on demographics and purchasing behavior and determine which characteristics are associated with successful campaign responses.

Objective

The goal of this project is to use customer analytics to better understand customer behavior, specifically their spending patterns to improve market decision-making.

Research Questions

  1. What demographic characteristics are associated with higher customer spending?

  2. What item(s) do they typically buy?

  3. Can customers be segmented into meaningful groups based on purchasing behavior?

  4. Which customer characteristics are associated with positive campaign responses?

  5. How can marketers improve campaign targeting using customer segmentation?

Managerial Questions

  1. Which customers should receive future marketing promotions?

  2. Tailoring Marketing Campaigns

    1. Is there a way to tailor marketing to customer needs/spending behavior?
    2. How can we allocate marketing resources efficiently?
  3. Customer Groups

  1. Which customer groups generate the highest revenue?
  2. Which customer group has the highest potential for growth?

Analytics Methods

Descriptive Analytics

  • Summary Statistics
  • Customer Demographics
  • Spending Behavior *** ### Predictive Analytics
  • Predict campaign response likelihood

Introduction

Customer relationship management and targeted marketing have become increasingly important in today’s competitive business environment. Organizations that understand customer behavior can better allocate marketing resources and improve campaign effectiveness. The Customer Personality Analysis dataset provides demographic, purchasing, and campaign response information that can be used to better understand customer preferences and marketing outcomes.

The purpose of this project is to analyze customer characteristics and purchasing behavior to identify meaningful customer segments and evaluate factors associated with marketing campaign success. Using customer segmentation and predictive analytics techniques, this study seeks to provide actionable recommendations that improve marketing targeting and increase customer engagement.

Exploratory Data Analysis

The goal of this exploratory data analysis is to understand customer demographics, income, spending behavior, purchase channels, and marketing campaign responses.

This EDA will generate summary statistics, visualizations, trends, patterns, anomalies, and business insights for the Customer Personality Analysis dataset.

Pre-Processing

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'janitor'
## 
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
## 
## 
## 
## Attaching package: 'naniar'
## 
## 
## The following object is masked from 'package:skimr':
## 
##     n_complete
## 
## 
## corrplot 0.95 loaded
## 
## 
## Attaching package: 'kableExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
## 
## 
## 
## Attaching package: 'scales'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

Data Cleanup and Exploratory Analysis

Dataset Overview
Item Value
Rows 2240
Columns 29
Variable Summary
Variable Type Missing_Values
id id integer 0
year_birth year_birth integer 0
education education character 0
marital_status marital_status character 0
income income integer 24
kidhome kidhome integer 0
teenhome teenhome integer 0
dt_customer dt_customer character 0
recency recency integer 0
mnt_wines mnt_wines integer 0
mnt_fruits mnt_fruits integer 0
mnt_meat_products mnt_meat_products integer 0
mnt_fish_products mnt_fish_products integer 0
mnt_sweet_products mnt_sweet_products integer 0
mnt_gold_prods mnt_gold_prods integer 0
num_deals_purchases num_deals_purchases integer 0
num_web_purchases num_web_purchases integer 0
num_catalog_purchases num_catalog_purchases integer 0
num_store_purchases num_store_purchases integer 0
num_web_visits_month num_web_visits_month integer 0
accepted_cmp3 accepted_cmp3 integer 0
accepted_cmp4 accepted_cmp4 integer 0
accepted_cmp5 accepted_cmp5 integer 0
accepted_cmp1 accepted_cmp1 integer 0
accepted_cmp2 accepted_cmp2 integer 0
complain complain integer 0
z_cost_contact z_cost_contact integer 0
z_revenue z_revenue integer 0
response response integer 0
Preview of First 10 Rows
id year_birth education marital_status income kidhome teenhome dt_customer recency mnt_wines mnt_fruits mnt_meat_products mnt_fish_products mnt_sweet_products mnt_gold_prods num_deals_purchases num_web_purchases num_catalog_purchases num_store_purchases num_web_visits_month accepted_cmp3 accepted_cmp4 accepted_cmp5 accepted_cmp1 accepted_cmp2 complain z_cost_contact z_revenue response
5524 1957 Graduation Single 58138 0 0 04-09-2012 58 635 88 546 172 88 88 3 8 10 4 7 0 0 0 0 0 0 3 11 1
2174 1954 Graduation Single 46344 1 1 08-03-2014 38 11 1 6 2 1 6 2 1 1 2 5 0 0 0 0 0 0 3 11 0
4141 1965 Graduation Together 71613 0 0 21-08-2013 26 426 49 127 111 21 42 1 8 2 10 4 0 0 0 0 0 0 3 11 0
6182 1984 Graduation Together 26646 1 0 10-02-2014 26 11 4 20 10 3 5 2 2 0 4 6 0 0 0 0 0 0 3 11 0
5324 1981 PhD Married 58293 1 0 19-01-2014 94 173 43 118 46 27 15 5 5 3 6 5 0 0 0 0 0 0 3 11 0
7446 1967 Master Together 62513 0 1 09-09-2013 16 520 42 98 0 42 14 2 6 4 10 6 0 0 0 0 0 0 3 11 0
965 1971 Graduation Divorced 55635 0 1 13-11-2012 34 235 65 164 50 49 27 4 7 3 7 6 0 0 0 0 0 0 3 11 0
6177 1985 PhD Married 33454 1 0 08-05-2013 32 76 10 56 3 1 23 2 4 0 4 8 0 0 0 0 0 0 3 11 0
4855 1974 PhD Together 30351 1 0 06-06-2013 19 14 0 24 3 3 2 1 3 0 2 9 0 0 0 0 0 0 3 11 1
5899 1950 PhD Together 5648 1 1 13-03-2014 68 28 0 6 1 1 13 1 1 0 0 20 1 0 0 0 0 0 3 11 0

Missing Values Analysis

Missing Values by Variable
Variable Missing_Values
income 24

Income contains missing values in this dataset. Since income is important for understanding spending behavior, these missing values should be reviewed before modeling.

Data Cleaning

Duplicate Row Check
Cleaning_Check Count
Duplicate Rows 0
Income Outliers Identified Using IQR
id income education marital_status
8475 157243 PhD Married
1503 162397 PhD Together
5555 153924 Graduation Divorced
1501 160803 PhD Married
5336 157733 Master Together
4931 157146 Graduation Together
11181 156924 PhD Married
9432 666666 Graduation Together

The dataset was cleaned by checking for duplicate rows, converting the customer enrollment date into date format, replacing missing income values with the median income, and identifying income outliers using the IQR method. Extreme income outliers were removed to prevent them from distorting the analysis.

Feature Engineering for EDA

Summary Statistics

Summary Statistics for Key Variables
Statistic Value
age_Mean 57.21
age_Median 56.00
age_SD 11.99
age_Min 30.00
age_Max 133.00
income_Mean 51630.93
income_Median 51381.50
income_SD 20601.68
income_Min 1730.00
income_Max 113734.00
recency_Mean 49.11
recency_Median 49.00
recency_SD 28.95
recency_Min 0.00
recency_Max 99.00
children_Mean 0.95
children_Median 1.00
children_SD 0.75
children_Min 0.00
children_Max 3.00
total_spent_Mean 605.60
total_spent_Median 396.50
total_spent_SD 601.44
total_spent_Min 5.00
total_spent_Max 2525.00
total_purchases_Mean 12.54
total_purchases_Median 12.00
total_purchases_SD 7.18
total_purchases_Min 0.00
total_purchases_Max 32.00
total_campaigns_accepted_Mean 0.30
total_campaigns_accepted_Median 0.00
total_campaigns_accepted_SD 0.68
total_campaigns_accepted_Min 0.00
total_campaigns_accepted_Max 4.00

Customer Age Distribution

The age distribution helps identify the main age groups in the customer base.

Income Distribution

Income Boxplot

The income distribution and boxplot help identify variation and possible high-income outliers.

Total Spending Distribution

Total Spending Boxplot

Total spending helps identify high-value customers.

Product Category Spending

Total Spending by Product Category
Product Spending
Wine 680604
Fruits 58881
Meat 368993
Fish 84023
Sweets 60611
Gold 98579

This chart shows which product categories generate the most spending.

Purchase Channel Analysis

Purchases by Channel
Channel Purchases
Web 9146
Catalog 5884
Store 12964

This chart compares customer purchasing activity across web, catalog, and store channels.

Education Distribution

Customer Education Distribution
education n Percentage
2n Cycle 203 9.09
Basic 54 2.42
Graduation 1124 50.36
Master 369 16.53
PhD 482 21.59

Marital Status Distribution

Customer Marital Status Distribution
marital_status n Percentage
Absurd 2 0.09
Alone 3 0.13
Divorced 231 10.35
Married 861 38.58
Single 480 21.51
Together 576 25.81
Widow 77 3.45
YOLO 2 0.09

Campaign Acceptance

Campaign Acceptance by Campaign
Campaign Accepted
Campaign 1 144
Campaign 2 30
Campaign 3 163
Campaign 4 167
Campaign 5 163

Response Rate

Response Rate for Most Recent Campaign
response n Percentage
0 1898 85.04
1 334 14.96

Recency Distribution

Income and Total Spending Relationship

## `geom_smooth()` using formula = 'y ~ x'

Spending by Campaign Response

Correlation Analysis

Anomaly Detection

Oldest Customers

Oldest Customers
id year_birth age income total_spent
11004 1893 133 60182 22
1150 1899 127 83532 1853
7829 1900 126 36640 65
6663 1940 86 51141 157
6932 1941 85 93027 2119
2968 1943 83 48948 902
6142 1943 83 65073 900
7106 1943 83 75865 1242
8800 1943 83 48948 902
1453 1943 83 57513 1060

Highest-Income Customers

Highest-Income Customers
id income age total_spent
4619 113734 81 277
4611 105471 56 1724
10089 102692 52 1112
2798 102160 49 1240
7215 101970 43 1135
4248 98777 66 2008
7451 98777 66 2008
500 96876 49 1941
2109 96843 36 1544
6815 96547 46 809

Highest-Spending Customers

Highest-Spending Customers
id income age total_spent
5735 90638 35 2525
5350 90638 35 2525
1763 87679 38 2524
4580 75759 57 2486
4475 69098 77 2440
5453 90226 70 2352
10133 93790 56 2349
9010 83151 54 2346
5386 94384 73 2302
6024 94384 73 2302

Export Clean Dataset

EDA Findings

The exploratory analysis shows that customer income and total spending vary substantially across the dataset. This suggests that the customer base contains different customer segments, including lower-spending customers and high-value customers.

Product spending is not evenly distributed across categories. Some product categories generate more total spending than others, which may help identify important product preferences.

Purchase channel analysis shows how customers interact with the company through web, catalog, and store purchases.

Campaign acceptance appears to be relatively low, meaning most customers do not accept marketing campaigns.

Business Recommendation

Based on the EDA, the modeling team should pay close attention to income, total spending, purchase channels, recency, and previous campaign acceptance.

High-spending customers and recently active customers may represent valuable segments for future marketing campaigns.

Research Questions

Can customers be segmented into meaningful groups based on purchasing behavior?

We used the k-means model in to identify customer segments based on age, income, number of children, recency, total spending and purchase activity.

The Elbow plot helps justify how many customer segments to use. In this analysis, the elbow begins at about 2 clusters and forms the elbow at between 3 and 4. We elect to view 4 clusters for the analysis.



What demographic characteristics are associated with higher customer spending? Which customer characteristics are associated with positive campaign responses?
Customer Segment Summary by Cluster
cluster Average Age Average Income Average Children Average Total Spent Average Total Purchases Average Campaigns Accepted
1 55.47 81175.71 0.26 1612.07 19.94 2.53
2 59.03 74210.67 0.21 1323.66 19.56 0.29
3 54.68 34417.29 1.25 98.86 5.85 0.08
4 60.48 58306.43 1.12 736.39 17.22 0.22

What are they buying?



How can marketers improve campaign targeting using customer segmentation?

Customer segmentation helps marketers improve campaign targeting by grouping customers with similar behaviors, preferences, and spending patterns. Instead of sending the same campaign to everyone, marketers can tailor promotions to the segments most likely to respond. For example, high-spending customers may receive premium product offers, while lower-spending or less-engaged customers may receive discounts, reactivation campaigns, or introductory offers. This makes campaigns more relevant, improves response rates, and reduces wasted marketing spend.


Managerial Conclusion

High-spending segments (1 & 2) should be prioritized for retention through loyalty offers, premium promotions, and personalized recommendations.

Product spend patterns also show that customers prefer different categories, such as wine, meat, fruits, fish, sweets, or gold products. These differences allow marketers to create more targeted campaigns instead of using the same message for every customer.

Overall, segmentation can improve campaign effectiveness by helping marketers send the right offer to the right customer group. This can increase response rates, improve customer lifetime value, and reduce wasted marketing spend.