Companies often spend significant resources on marketing campaigns without fully understanding which customers are most likely to respond. Sending the same promotions to all customers can result in low response rates and inefficient use of marketing budgets.
This project seeks to identify customer segments based on demographics and purchasing behavior and determine which characteristics are associated with successful campaign responses.
The goal of this project is to use customer analytics to better understand customer behavior, specifically their spending patterns to improve market decision-making.
What demographic characteristics are associated with higher customer spending?
What item(s) do they typically buy?
Can customers be segmented into meaningful groups based on purchasing behavior?
Which customer characteristics are associated with positive campaign responses?
How can marketers improve campaign targeting using customer segmentation?
Which customers should receive future marketing promotions?
Tailoring Marketing Campaigns
Customer Groups
Customer relationship management and targeted marketing have become increasingly important in today’s competitive business environment. Organizations that understand customer behavior can better allocate marketing resources and improve campaign effectiveness. The Customer Personality Analysis dataset provides demographic, purchasing, and campaign response information that can be used to better understand customer preferences and marketing outcomes.
The purpose of this project is to analyze customer characteristics and purchasing behavior to identify meaningful customer segments and evaluate factors associated with marketing campaign success. Using customer segmentation and predictive analytics techniques, this study seeks to provide actionable recommendations that improve marketing targeting and increase customer engagement.
The goal of this exploratory data analysis is to understand customer demographics, income, spending behavior, purchase channels, and marketing campaign responses.
This EDA will generate summary statistics, visualizations, trends, patterns, anomalies, and business insights for the Customer Personality Analysis dataset.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.3 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'janitor'
##
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
##
##
##
## Attaching package: 'naniar'
##
##
## The following object is masked from 'package:skimr':
##
## n_complete
##
##
## corrplot 0.95 loaded
##
##
## Attaching package: 'kableExtra'
##
##
## The following object is masked from 'package:dplyr':
##
## group_rows
##
##
##
## Attaching package: 'scales'
##
##
## The following object is masked from 'package:purrr':
##
## discard
##
##
## The following object is masked from 'package:readr':
##
## col_factor
| Item | Value |
|---|---|
| Rows | 2240 |
| Columns | 29 |
| Variable | Type | Missing_Values | |
|---|---|---|---|
| id | id | integer | 0 |
| year_birth | year_birth | integer | 0 |
| education | education | character | 0 |
| marital_status | marital_status | character | 0 |
| income | income | integer | 24 |
| kidhome | kidhome | integer | 0 |
| teenhome | teenhome | integer | 0 |
| dt_customer | dt_customer | character | 0 |
| recency | recency | integer | 0 |
| mnt_wines | mnt_wines | integer | 0 |
| mnt_fruits | mnt_fruits | integer | 0 |
| mnt_meat_products | mnt_meat_products | integer | 0 |
| mnt_fish_products | mnt_fish_products | integer | 0 |
| mnt_sweet_products | mnt_sweet_products | integer | 0 |
| mnt_gold_prods | mnt_gold_prods | integer | 0 |
| num_deals_purchases | num_deals_purchases | integer | 0 |
| num_web_purchases | num_web_purchases | integer | 0 |
| num_catalog_purchases | num_catalog_purchases | integer | 0 |
| num_store_purchases | num_store_purchases | integer | 0 |
| num_web_visits_month | num_web_visits_month | integer | 0 |
| accepted_cmp3 | accepted_cmp3 | integer | 0 |
| accepted_cmp4 | accepted_cmp4 | integer | 0 |
| accepted_cmp5 | accepted_cmp5 | integer | 0 |
| accepted_cmp1 | accepted_cmp1 | integer | 0 |
| accepted_cmp2 | accepted_cmp2 | integer | 0 |
| complain | complain | integer | 0 |
| z_cost_contact | z_cost_contact | integer | 0 |
| z_revenue | z_revenue | integer | 0 |
| response | response | integer | 0 |
| id | year_birth | education | marital_status | income | kidhome | teenhome | dt_customer | recency | mnt_wines | mnt_fruits | mnt_meat_products | mnt_fish_products | mnt_sweet_products | mnt_gold_prods | num_deals_purchases | num_web_purchases | num_catalog_purchases | num_store_purchases | num_web_visits_month | accepted_cmp3 | accepted_cmp4 | accepted_cmp5 | accepted_cmp1 | accepted_cmp2 | complain | z_cost_contact | z_revenue | response |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5524 | 1957 | Graduation | Single | 58138 | 0 | 0 | 04-09-2012 | 58 | 635 | 88 | 546 | 172 | 88 | 88 | 3 | 8 | 10 | 4 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 1 |
| 2174 | 1954 | Graduation | Single | 46344 | 1 | 1 | 08-03-2014 | 38 | 11 | 1 | 6 | 2 | 1 | 6 | 2 | 1 | 1 | 2 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
| 4141 | 1965 | Graduation | Together | 71613 | 0 | 0 | 21-08-2013 | 26 | 426 | 49 | 127 | 111 | 21 | 42 | 1 | 8 | 2 | 10 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
| 6182 | 1984 | Graduation | Together | 26646 | 1 | 0 | 10-02-2014 | 26 | 11 | 4 | 20 | 10 | 3 | 5 | 2 | 2 | 0 | 4 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
| 5324 | 1981 | PhD | Married | 58293 | 1 | 0 | 19-01-2014 | 94 | 173 | 43 | 118 | 46 | 27 | 15 | 5 | 5 | 3 | 6 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
| 7446 | 1967 | Master | Together | 62513 | 0 | 1 | 09-09-2013 | 16 | 520 | 42 | 98 | 0 | 42 | 14 | 2 | 6 | 4 | 10 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
| 965 | 1971 | Graduation | Divorced | 55635 | 0 | 1 | 13-11-2012 | 34 | 235 | 65 | 164 | 50 | 49 | 27 | 4 | 7 | 3 | 7 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
| 6177 | 1985 | PhD | Married | 33454 | 1 | 0 | 08-05-2013 | 32 | 76 | 10 | 56 | 3 | 1 | 23 | 2 | 4 | 0 | 4 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
| 4855 | 1974 | PhD | Together | 30351 | 1 | 0 | 06-06-2013 | 19 | 14 | 0 | 24 | 3 | 3 | 2 | 1 | 3 | 0 | 2 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 1 |
| 5899 | 1950 | PhD | Together | 5648 | 1 | 1 | 13-03-2014 | 68 | 28 | 0 | 6 | 1 | 1 | 13 | 1 | 1 | 0 | 0 | 20 | 1 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
| Variable | Missing_Values |
|---|---|
| income | 24 |
Income contains missing values in this dataset. Since income is important for understanding spending behavior, these missing values should be reviewed before modeling.
| Cleaning_Check | Count |
|---|---|
| Duplicate Rows | 0 |
| id | income | education | marital_status |
|---|---|---|---|
| 8475 | 157243 | PhD | Married |
| 1503 | 162397 | PhD | Together |
| 5555 | 153924 | Graduation | Divorced |
| 1501 | 160803 | PhD | Married |
| 5336 | 157733 | Master | Together |
| 4931 | 157146 | Graduation | Together |
| 11181 | 156924 | PhD | Married |
| 9432 | 666666 | Graduation | Together |
The dataset was cleaned by checking for duplicate rows, converting the customer enrollment date into date format, replacing missing income values with the median income, and identifying income outliers using the IQR method. Extreme income outliers were removed to prevent them from distorting the analysis.
| Statistic | Value |
|---|---|
| age_Mean | 57.21 |
| age_Median | 56.00 |
| age_SD | 11.99 |
| age_Min | 30.00 |
| age_Max | 133.00 |
| income_Mean | 51630.93 |
| income_Median | 51381.50 |
| income_SD | 20601.68 |
| income_Min | 1730.00 |
| income_Max | 113734.00 |
| recency_Mean | 49.11 |
| recency_Median | 49.00 |
| recency_SD | 28.95 |
| recency_Min | 0.00 |
| recency_Max | 99.00 |
| children_Mean | 0.95 |
| children_Median | 1.00 |
| children_SD | 0.75 |
| children_Min | 0.00 |
| children_Max | 3.00 |
| total_spent_Mean | 605.60 |
| total_spent_Median | 396.50 |
| total_spent_SD | 601.44 |
| total_spent_Min | 5.00 |
| total_spent_Max | 2525.00 |
| total_purchases_Mean | 12.54 |
| total_purchases_Median | 12.00 |
| total_purchases_SD | 7.18 |
| total_purchases_Min | 0.00 |
| total_purchases_Max | 32.00 |
| total_campaigns_accepted_Mean | 0.30 |
| total_campaigns_accepted_Median | 0.00 |
| total_campaigns_accepted_SD | 0.68 |
| total_campaigns_accepted_Min | 0.00 |
| total_campaigns_accepted_Max | 4.00 |
The age distribution helps identify the main age groups in the customer base.
The income distribution and boxplot help identify variation and possible high-income outliers.
Total spending helps identify high-value customers.
| Product | Spending |
|---|---|
| Wine | 680604 |
| Fruits | 58881 |
| Meat | 368993 |
| Fish | 84023 |
| Sweets | 60611 |
| Gold | 98579 |
This chart shows which product categories generate the most spending.
| Channel | Purchases |
|---|---|
| Web | 9146 |
| Catalog | 5884 |
| Store | 12964 |
This chart compares customer purchasing activity across web, catalog, and store channels.
| education | n | Percentage |
|---|---|---|
| 2n Cycle | 203 | 9.09 |
| Basic | 54 | 2.42 |
| Graduation | 1124 | 50.36 |
| Master | 369 | 16.53 |
| PhD | 482 | 21.59 |
| marital_status | n | Percentage |
|---|---|---|
| Absurd | 2 | 0.09 |
| Alone | 3 | 0.13 |
| Divorced | 231 | 10.35 |
| Married | 861 | 38.58 |
| Single | 480 | 21.51 |
| Together | 576 | 25.81 |
| Widow | 77 | 3.45 |
| YOLO | 2 | 0.09 |
| Campaign | Accepted |
|---|---|
| Campaign 1 | 144 |
| Campaign 2 | 30 |
| Campaign 3 | 163 |
| Campaign 4 | 167 |
| Campaign 5 | 163 |
| response | n | Percentage |
|---|---|---|
| 0 | 1898 | 85.04 |
| 1 | 334 | 14.96 |
## `geom_smooth()` using formula = 'y ~ x'
| id | year_birth | age | income | total_spent |
|---|---|---|---|---|
| 11004 | 1893 | 133 | 60182 | 22 |
| 1150 | 1899 | 127 | 83532 | 1853 |
| 7829 | 1900 | 126 | 36640 | 65 |
| 6663 | 1940 | 86 | 51141 | 157 |
| 6932 | 1941 | 85 | 93027 | 2119 |
| 2968 | 1943 | 83 | 48948 | 902 |
| 6142 | 1943 | 83 | 65073 | 900 |
| 7106 | 1943 | 83 | 75865 | 1242 |
| 8800 | 1943 | 83 | 48948 | 902 |
| 1453 | 1943 | 83 | 57513 | 1060 |
| id | income | age | total_spent |
|---|---|---|---|
| 4619 | 113734 | 81 | 277 |
| 4611 | 105471 | 56 | 1724 |
| 10089 | 102692 | 52 | 1112 |
| 2798 | 102160 | 49 | 1240 |
| 7215 | 101970 | 43 | 1135 |
| 4248 | 98777 | 66 | 2008 |
| 7451 | 98777 | 66 | 2008 |
| 500 | 96876 | 49 | 1941 |
| 2109 | 96843 | 36 | 1544 |
| 6815 | 96547 | 46 | 809 |
| id | income | age | total_spent |
|---|---|---|---|
| 5735 | 90638 | 35 | 2525 |
| 5350 | 90638 | 35 | 2525 |
| 1763 | 87679 | 38 | 2524 |
| 4580 | 75759 | 57 | 2486 |
| 4475 | 69098 | 77 | 2440 |
| 5453 | 90226 | 70 | 2352 |
| 10133 | 93790 | 56 | 2349 |
| 9010 | 83151 | 54 | 2346 |
| 5386 | 94384 | 73 | 2302 |
| 6024 | 94384 | 73 | 2302 |
The exploratory analysis shows that customer income and total spending vary substantially across the dataset. This suggests that the customer base contains different customer segments, including lower-spending customers and high-value customers.
Product spending is not evenly distributed across categories. Some product categories generate more total spending than others, which may help identify important product preferences.
Purchase channel analysis shows how customers interact with the company through web, catalog, and store purchases.
Campaign acceptance appears to be relatively low, meaning most customers do not accept marketing campaigns.
Based on the EDA, the modeling team should pay close attention to income, total spending, purchase channels, recency, and previous campaign acceptance.
High-spending customers and recently active customers may represent valuable segments for future marketing campaigns.
We used the k-means model in to identify customer segments based on age, income, number of children, recency, total spending and purchase activity.
The Elbow plot helps justify how many customer segments to use. In this analysis, the elbow begins at about 2 clusters and forms the elbow at between 3 and 4. We elect to view 4 clusters for the analysis.
| cluster | Average Age | Average Income | Average Children | Average Total Spent | Average Total Purchases | Average Campaigns Accepted |
|---|---|---|---|---|---|---|
| 1 | 55.47 | 81175.71 | 0.26 | 1612.07 | 19.94 | 2.53 |
| 2 | 59.03 | 74210.67 | 0.21 | 1323.66 | 19.56 | 0.29 |
| 3 | 54.68 | 34417.29 | 1.25 | 98.86 | 5.85 | 0.08 |
| 4 | 60.48 | 58306.43 | 1.12 | 736.39 | 17.22 | 0.22 |
Customer segmentation helps marketers improve campaign targeting by grouping customers with similar behaviors, preferences, and spending patterns. Instead of sending the same campaign to everyone, marketers can tailor promotions to the segments most likely to respond. For example, high-spending customers may receive premium product offers, while lower-spending or less-engaged customers may receive discounts, reactivation campaigns, or introductory offers. This makes campaigns more relevant, improves response rates, and reduces wasted marketing spend.
High-spending segments (1 & 2) should be prioritized for retention through loyalty offers, premium promotions, and personalized recommendations.
Product spend patterns also show that customers prefer different categories, such as wine, meat, fruits, fish, sweets, or gold products. These differences allow marketers to create more targeted campaigns instead of using the same message for every customer.
Overall, segmentation can improve campaign effectiveness by helping marketers send the right offer to the right customer group. This can increase response rates, improve customer lifetime value, and reduce wasted marketing spend.