This dataset provides a comprehensive profile of customers, including demographic information, purchasing behavior, and responses to marketing campaigns. Using clustering analysis, we can group customers into distinct segments based on shared characteristics such as income, education, and spending habits. This can help us understand the different types of customers in our market and tailor our marketing strategies to each segment.
Data source: https://www.kaggle.com/datasets/vishakhdapat/customer-segmentation-clustering
#Importing dataset
library(readr)
mydata <- read_csv("~/R data/MVA Homework/customer_segmentation.csv")
## Rows: 2240 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Education, Marital_Status, Dt_Customer
## dbl (26): ID, Year_Birth, Income, Kidhome, Teenhome, Recency, MntWines, MntFruits, MntMeatProducts, MntFishProducts,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(mydata)
## # A tibble: 6 × 29
## ID Year_Birth Education Marital_Status Income Kidhome Teenhome Dt_Customer Recency MntWines MntFruits
## <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 5524 1957 Graduation Single 58138 0 0 04-09-2012 58 635 88
## 2 2174 1954 Graduation Single 46344 1 1 08-03-2014 38 11 1
## 3 4141 1965 Graduation Together 71613 0 0 21-08-2013 26 426 49
## 4 6182 1984 Graduation Together 26646 1 0 10-02-2014 26 11 4
## 5 5324 1981 PhD Married 58293 1 0 19-01-2014 94 173 43
## 6 7446 1967 Master Together 62513 0 1 09-09-2013 16 520 42
## # ℹ 18 more variables: MntMeatProducts <dbl>, MntFishProducts <dbl>, MntSweetProducts <dbl>, MntGoldProds <dbl>,
## # NumDealsPurchases <dbl>, NumWebPurchases <dbl>, NumCatalogPurchases <dbl>, NumStorePurchases <dbl>,
## # NumWebVisitsMonth <dbl>, AcceptedCmp3 <dbl>, AcceptedCmp4 <dbl>, AcceptedCmp5 <dbl>, AcceptedCmp1 <dbl>,
## # AcceptedCmp2 <dbl>, Complain <dbl>, Z_CostContact <dbl>, Z_Revenue <dbl>, Response <dbl>
#Randomly choosing and keeping only 200 units, because the sample is huge (2000+)
set.seed(200)
mydata <- mydata[sample(nrow(mydata), 200), ]
#Adding the ID column
mydata$ID <- seq(1, nrow(mydata))
#Creating columns with Age of customers and number of children
mydata$Age <- 2024 - mydata$Year_Birth
mydata$ChildHome <- mydata$Kidhome + mydata$Teenhome
# Selecting the relevant columns
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mydata <- select(mydata, ID, Income, MntWines, MntFruits, MntMeatProducts, MntFishProducts, Age, ChildHome, NumWebPurchases, NumStorePurchases, Recency, NumWebVisitsMonth)
head(mydata)
## # A tibble: 6 × 12
## ID Income MntWines MntFruits MntMeatProducts MntFishProducts Age ChildHome NumWebPurchases NumStorePurchases
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 80617 594 51 631 72 29 0 4 8
## 2 2 41658 8 4 12 15 49 2 1 2
## 3 3 42664 21 0 3 0 57 1 1 3
## 4 4 81975 983 76 184 180 62 1 6 4
## 5 5 16014 3 9 4 7 45 2 1 4
## 6 6 60482 255 43 134 37 48 1 7 7
## # ℹ 2 more variables: Recency <dbl>, NumWebVisitsMonth <dbl>
Unit of observation: Individual customer
Sample size: 200 customers
Variables:
++ ID: Unique identifier for each individual in the dataset.
++ Income: The annual income of the individual (money units).
++ Age: Age of customers (years).
++ ChildHome: Number of children of customers.
++ NumWebPurchases: Number of purchases a customer has done on the webpage per month.
++ NumStorePurchases: Number of purchases a customer has done in store per month.
++ Recency: The number of days since the last purchase or interaction.
++ NumWebVisitsMonth: Number of webpage visits per month.
Research question:
How can we segment our customer base into distinct groups based on their income and spending habits on various products (wines, fruits, meat, and fish products)?
# Removing rows where units have NA in column Income
mydata <- mydata[!is.na(mydata$Income), ]
summary(mydata)
## ID Income MntWines MntFruits MntMeatProducts MntFishProducts Age
## Min. : 1.0 Min. : 4023 Min. : 0.0 Min. : 0.00 Min. : 1.0 Min. : 0.00 Min. :29.0
## 1st Qu.: 51.0 1st Qu.:31385 1st Qu.: 16.0 1st Qu.: 2.00 1st Qu.: 12.0 1st Qu.: 3.00 1st Qu.:46.0
## Median :101.0 Median :48070 Median : 123.0 Median : 7.00 Median : 46.0 Median : 11.00 Median :53.0
## Mean :100.9 Mean :48962 Mean : 286.1 Mean : 22.92 Mean :157.1 Mean : 31.58 Mean :54.3
## 3rd Qu.:151.0 3rd Qu.:65316 3rd Qu.: 493.0 3rd Qu.: 26.00 3rd Qu.:207.0 3rd Qu.: 32.00 3rd Qu.:65.0
## Max. :200.0 Max. :95529 Max. :1285.0 Max. :199.00 Max. :890.0 Max. :250.00 Max. :84.0
## ChildHome NumWebPurchases NumStorePurchases Recency NumWebVisitsMonth
## Min. :0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.: 2.000 1st Qu.: 3.000 1st Qu.:26.00 1st Qu.: 4.000
## Median :1.0000 Median : 3.000 Median : 4.000 Median :51.00 Median : 6.000
## Mean :0.9442 Mean : 3.888 Mean : 5.584 Mean :48.85 Mean : 5.548
## 3rd Qu.:1.0000 3rd Qu.: 6.000 3rd Qu.: 8.000 3rd Qu.:74.00 3rd Qu.: 7.000
## Max. :3.0000 Max. :11.000 Max. :13.000 Max. :99.00 Max. :20.000
Explanation of few parameters for different variables:
Income: The annual income of customers ranges from a minimum of 8,820 money units to a maximum of 95,529 money units. The average income is 49,644 money units, with 25% of customers earning 32,188 money units or less, and 75% of customers earning 66,455 money units or less.
Age: The age of customers ranges from 29 to 84 years. The average age is 54 years, with 25% of customers being 46 years or younger, and 75% of customers being 65 years or younger.
ChildHome: The number of children at home ranges from 0 to 3. On average, customers have approximately 1 child at home.
NumWebPurchases: The number of purchases made on the webpage per month ranges from 0 to 11. On average, customers make about 4 web purchases per month.
NumStorePurchases: The number of purchases made in store per month ranges from 2 to 13. On average, customers make about 6 store purchases per month.
Recency: The number of days since the last purchase or interaction ranges from 0 to 99 days. On average, the last purchase or interaction was about 49 days ago.
NumWebVisitsMonth: The number of webpage visits per month ranges from 0 to 9. On average, customers visit the webpage about 5 times per month.
#Standardizing
mydata$Income_z <- scale(mydata$Income)
mydata$MntWines_z <- scale(mydata$MntWines)
mydata$MntFruits_z <- scale(mydata$MntFruits)
mydata$MntMeatProducts_z <- scale(mydata$MntMeatProducts)
mydata$MntFishProducts_z <- scale(mydata$MntFishProducts)
mydata$ChildHome_z <- scale(mydata$ChildHome)
mydata$NumWebP_z <- scale(mydata$NumWebPurchases)
mydata$NumStoreP_z <- scale(mydata$NumStorePurchases)
mydata$Recency_z <- scale(mydata$Recency)
mydata$NumWebVisitsMonth_z <- scale(mydata$NumWebVisitsMonth)
library(Hmisc)
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
rcorr(as.matrix(mydata[, c("Income_z", "ChildHome_z", "NumWebP_z", "Recency_z", "NumWebVisitsMonth_z")]),
type = "pearson")
## Income_z ChildHome_z NumWebP_z Recency_z NumWebVisitsMonth_z
## Income_z 1.00 -0.34 0.54 -0.02 -0.66
## ChildHome_z -0.34 1.00 -0.16 0.13 0.35
## NumWebP_z 0.54 -0.16 1.00 -0.08 -0.05
## Recency_z -0.02 0.13 -0.08 1.00 -0.04
## NumWebVisitsMonth_z -0.66 0.35 -0.05 -0.04 1.00
##
## n= 197
##
##
## P
## Income_z ChildHome_z NumWebP_z Recency_z NumWebVisitsMonth_z
## Income_z 0.0000 0.0000 0.8227 0.0000
## ChildHome_z 0.0000 0.0257 0.0626 0.0000
## NumWebP_z 0.0000 0.0257 0.2708 0.4457
## Recency_z 0.8227 0.0626 0.2708 0.5626
## NumWebVisitsMonth_z 0.0000 0.0000 0.4457 0.5626
#Finding outliers
mydata$Dissimilarity_z <- sqrt(mydata$Income_z^2 + mydata$NumWebP_z^2 + mydata$Recency_z^2 + mydata$NumWebVisitsMonth_z^2 + mydata$ChildHome_z^2)
#Showing 10 units with the highest value of dissimilarity
head(mydata[order(-mydata$Dissimilarity_z), c("ID", "Dissimilarity_z")], 10)
## # A tibble: 10 × 2
## ID Dissimilarity_z[,1]
## <int> <dbl>
## 1 90 5.93
## 2 109 5.72
## 3 30 4.28
## 4 168 3.37
## 5 179 3.28
## 6 101 3.24
## 7 126 3.21
## 8 156 3.19
## 9 172 3.15
## 10 163 3.09
#Removing outliers
library(dplyr)
mydata <- mydata %>%
filter(!ID %in% c(90, 109, 30))
#Standardizing one more time after removing units
mydata$Income_z <- scale(mydata$Income)
mydata$ChildHome_z <- scale(mydata$ChildHome)
mydata$NumWebP_z <- scale(mydata$NumWebPurchases)
mydata$Recency_z <- scale(mydata$Recency)
mydata$NumWebVisitsMonth_z <- scale(mydata$NumWebVisitsMonth)
round(head(mydata[c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z")], 3), 2)
## # A tibble: 3 × 5
## Income_z[,1] ChildHome_z[,1] NumWebP_z[,1] NumWebVisitsMonth_z[,1] Recency_z[,1]
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1.49 -1.22 0.02 -1.46 -0.24
## 2 -0.38 1.39 -1.11 -0.59 -0.66
## 3 -0.34 0.08 -1.11 0.28 -0.17
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
Distance_euclidian <- get_dist(mydata[c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z")],
method = "euclidian")
Distance_euclidian2 <- Distance_euclidian^2
#Showing dissimilarity matrix
fviz_dist(Distance_euclidian2)
Based on the dissimilarity matrix, we can evaluate that customers can be classify into four groups. But before that, we have to check the Hopkins statistics, to see if the data is truly clusterable.
#Cheking if data is clusterable with Hopkins statistics
#install.packages("factoextra")
library(factoextra)
get_clust_tendency(mydata[c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z")],
n = nrow(mydata) - 1,
graph = FALSE)
## $hopkins_stat
## [1] 0.6484489
##
## $plot
## NULL
Hopkins statistic is 0.65 which is bigger than 0.5, which means that the data is clusterable.
library(dplyr)
library(factoextra)
WARD <- mydata[c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z")] %>%
get_dist(method = "euclidean") %>%
hclust(method = "ward.D2")
WARD
##
## Call:
## hclust(d = ., method = "ward.D2")
##
## Cluster method : ward.D2
## Distance : euclidean
## Number of objects: 194
With 193 steps, we will make 1 cluster (group).
library(factoextra)
fviz_dend(WARD,
k = 4,
cex = 0.5,
pallete = "jama",
color_labels_by_k = TRUE,
rect = TRUE)
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
## Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Based on the dendogram, we can see that it is really best to cut it into four groups.
We now have to indicate the initial leaders of the groups that will be the basis for K-means method.
mydata$ClassificationWARD <- cutree(WARD,
k = 4)
head(mydata[c("ID", "Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z", "Dissimilarity_z", "ClassificationWARD")])
## # A tibble: 6 × 8
## ID Income_z[,1] ChildHome_z[,1] NumWebP_z[,1] NumWebVisitsMonth_z[,1] Recency_z[,1] Dissimilarity_z[,1]
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1.49 -1.22 0.0213 -1.46 -0.241 2.33
## 2 2 -0.383 1.39 -1.11 -0.592 -0.655 1.97
## 3 3 -0.335 0.0807 -1.11 0.278 -0.172 1.15
## 4 4 1.55 0.0807 0.774 0.713 -1.62 2.43
## 5 5 -1.61 1.39 -1.11 -1.03 -0.241 2.52
## 6 6 0.520 0.0807 1.15 0.713 1.10 1.78
## # ℹ 1 more variable: ClassificationWARD <int>
#Calculating the positions of initial leaders
InitialLeaders <- aggregate(mydata[,c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z")],
by = list(mydata$ClassificationWARD),
FUN = mean)
round(InitialLeaders, 3)
## Group.1 Income_z ChildHome_z NumWebP_z NumWebVisitsMonth_z Recency_z
## 1 1 1.164 -0.947 0.157 -1.342 0.189
## 2 2 -0.682 0.046 -0.530 0.423 -0.595
## 3 3 0.609 0.382 1.391 0.301 -0.013
## 4 4 -0.827 0.793 -0.663 0.595 1.100
library(factoextra)
#Performing K-Means clustering
K_MEANS <- hkmeans(mydata[c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z")],
k = 4,
hc.metric = "euclidean",
hc.method = "ward.D2")
K_MEANS
## Hierarchical K-means clustering with 4 clusters of sizes 44, 56, 43, 51
##
## Cluster means:
## Income_z ChildHome_z NumWebP_z NumWebVisitsMonth_z Recency_z
## 1 1.2059384 -1.01697895 0.0726130 -1.4232561 0.04390027
## 2 -0.7537547 0.03412505 -0.6435780 0.4958348 -0.77547806
## 3 0.5556536 0.14146303 1.4732741 0.3591842 -0.14037876
## 4 -0.6812575 0.72065020 -0.5981449 0.3806196 0.93198914
##
## Clustering vector:
## [1] 1 2 2 3 4 3 3 1 1 4 1 1 3 4 4 3 3 4 2 3 4 3 2 2 2 4 1 4 1 2 2 1 4 1 2 3 4 4 2 4 2 3 4 2 4 4 4 1 2 3 3 4 4 1 2 4 2
## [58] 1 4 2 1 4 2 3 1 2 2 2 4 1 1 3 1 3 3 2 1 3 2 4 1 1 4 4 1 1 2 4 1 2 1 1 4 2 2 2 4 3 2 3 4 4 3 2 1 3 4 2 3 1 2 2 4 4
## [115] 1 3 3 4 2 4 3 2 2 2 2 4 3 2 1 3 2 2 2 4 2 4 2 1 4 1 2 3 3 4 4 4 3 1 2 4 3 4 3 3 1 2 1 2 2 2 1 3 1 1 1 1 1 2 4 2 2
## [172] 3 4 3 3 3 3 3 4 4 1 1 2 2 1 4 2 3 4 4 2 3 3 1
##
## Within cluster sum of squares by cluster:
## [1] 88.84582 113.99444 102.72623 124.01733
## (between_SS / total_SS = 55.5 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size"
## [8] "iter" "ifault" "data" "hclust"
In the table “Cluster means” we can see the final leaders.
In clustering vector we can see that the first unit is clustered in group 1, second unit is clustered in group 2 etc.
fviz_cluster(K_MEANS,
palette = "jama",
repel = FALSE,
ggtheme = theme_classic())
We will now check if any unit has been reclassified.
mydata$ClassificationK_MEANS <- K_MEANS$cluster
head(mydata[, c("ID", "ClassificationWARD", "ClassificationK_MEANS")])
## # A tibble: 6 × 3
## ID ClassificationWARD ClassificationK_MEANS
## <int> <int> <int>
## 1 1 1 1
## 2 2 2 2
## 3 3 2 2
## 4 4 3 3
## 5 5 2 4
## 6 6 3 3
table(mydata$ClassificationWARD)
##
## 1 2 3 4
## 47 75 39 33
table(mydata$ClassificationK_MEANS)
##
## 1 2 3 4
## 44 56 43 51
table(mydata$ClassificationWARD, mydata$ClassificationK_MEANS)
##
## 1 2 3 4
## 1 40 0 3 4
## 2 4 55 5 11
## 3 0 0 35 4
## 4 0 1 0 32
There have been many reclassifications.
In the beginning there were 47 units in group 1, when 40 stayed in the same group, 3 of them were reclassified into group 3 and 4 of them were reclassified into group 4. Now, there are 44 units in group 1, where 4 of them came from group 2.
In the beginning there were 75 units in group 2, when 4 were reclassified into group 1, 5 into group 3 and 11 into group 4. Now, there are 56 units in group 2 and 1 of them came from group 4.
In the beginning there were 39 units in group 3, when 4 of them were reclassified into group 4. Now, there are 43 units in group 3, where 3 of them came from group 1 and 5 of them came from group 2.
In the beginning there were 33 units in group 4, when 1 of them were reclassified into group 2. Now, there are 51 units in group 4, where 4 of them came from group 1, 11 units came from group 2 and 4 of them came from group 3.
Averages <- K_MEANS$centers
round(Averages, 3)
## Income_z ChildHome_z NumWebP_z NumWebVisitsMonth_z Recency_z
## 1 1.206 -1.017 0.073 -1.423 0.044
## 2 -0.754 0.034 -0.644 0.496 -0.775
## 3 0.556 0.141 1.473 0.359 -0.140
## 4 -0.681 0.721 -0.598 0.381 0.932
library(ggplot2)
library(tidyr)
Picture <- as.data.frame(Averages)
Picture$id <- 1:nrow(Picture)
Picture <- pivot_longer(Picture, cols = c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z"))
Picture$Group <- factor(Picture$id,
levels = c(1, 2, 3, 4, 5),
labels = c("1", "2", "3", "4", "5"))
Picture$nameFactor <- factor(Picture$name,
levels = c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z"),
labels = c("Income_z", "ChildHome_z", "NumWebP_z", "NumWebVisitsMonth_z", "Recency_z"))
ggplot(Picture, aes(x = nameFactor, y = value)) +
geom_hline(yintercept = 0) +
theme_bw() +
geom_point(aes(shape = Group, col = Group), size = 2) +
geom_line(aes(group = id, linetype = Group, col = Group), linewidth = 1) +
ylab("Average") +
xlab("Cluster variables")
We will now check the appropriateness of the cluster variables used.
#Are all the cluster variables successful at classifying units into groups? Performing ANOVAs.
fit <- aov(cbind(Income_z, ChildHome_z, NumWebP_z, NumWebVisitsMonth_z, Recency_z) ~ as.factor(ClassificationK_MEANS),
data = mydata)
summary(fit)
## Response 1 :
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(ClassificationK_MEANS) 3 132.751 44.250 139.55 < 2.2e-16 ***
## Residuals 190 60.249 0.317
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response 2 :
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(ClassificationK_MEANS) 3 72.919 24.306 38.459 < 2.2e-16 ***
## Residuals 190 120.081 0.632
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response 3 :
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(ClassificationK_MEANS) 3 135.007 45.002 147.44 < 2.2e-16 ***
## Residuals 190 57.993 0.305
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response 4 :
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(ClassificationK_MEANS) 3 115.833 38.611 95.067 < 2.2e-16 ***
## Residuals 190 77.167 0.406
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response 5 :
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(ClassificationK_MEANS) 3 78.907 26.3025 43.802 < 2.2e-16 ***
## Residuals 190 114.093 0.6005
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can reject null hypothesis for all cluster variables (p<0.001), which means that the choice of cluster variables is correct.
aggregate(mydata$NumStorePurchases,
by = list(mydata$ClassificationK_MEANS),
FUN = "mean")
## Group.1 x
## 1 1 8.750000
## 2 2 3.428571
## 3 3 8.000000
## 4 4 3.509804
fit <- aov(NumStorePurchases ~ as.factor(ClassificationK_MEANS),
data = mydata)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(ClassificationK_MEANS) 3 1170.2 390.1 85.71 <2e-16 ***
## Residuals 190 864.7 4.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
H0: average(NumStorePurchases, G1) = average(NumStorePurchases, G2) = average(NumStorePurchases, G3) = average(NumStorePurchases, G4) H1: at least one average(NumStorePurchases, group) of the group is different
We can reject the null hypothesis at p<0.001, variable Number of Store purchases is useful.
On the basis of 5 standardized variables (Income, Number of children, Number of webpage purchases, number of webpage visits, recency), we clustered 194 customers into 4 groups by hierarchical clustering (Ward’s algorithm, squared Euclidian distance). The number of groups was determined based on the dendogram. We further optimized the classification by using the K-means method, in which many units were reclassified.
Group 1 (44 customers): This group has customers with an income significantly above the average, fewer children at home than average, slightly above average web purchases, significantly fewer website visits per month, and recency close to the average. This suggests these are high-income customers who make frequent online purchases but visit the website less frequently. From the marketing perspective, this could be a sign to make more tailored email campaigns highlighting premium products or exclusive online offers to achieve increase in purchasing. Since they visit the website less frequently, ensuring the website is engaging and user-friendly could encourage more frequent visits.
Group 2 (56 customers): This group consists of customers with an income significantly below the average, an average number of children at home, significantly fewer web purchases and website visits per month but is recent. This suggests these are lower-income customers with more children, who interact less with the website but have recently made a purchase or interaction. Marketing strategies could include promoting budget-friendly and family-oriented products. Since they have recently interacted with the store, follow-up communications about their recent purchase or interaction could help maintain engagement.
Group 3 (43 customers): This group includes customers with an income slightly above the average, slightly more children at home than average, significantly more web purchases and website visits per month with average recency. This suggests these customers are average in terms of income and family size, but are highly engaged with the website and make frequent online purchases. From marketing point of view - they could be targeted with frequent updates about new products or promotions. Implementing a loyalty program could also be beneficial to encourage repeat purchases.
Group 4 (51 customers): This group has customers with an income below the average, significantly more children at home, significantly fewer web purchases but high website visits per month with the highest recency. This suggests these are lower-income customers with more children, who visit the website frequently but make fewer purchases, and have the most recent interaction or purchase. Marketing strategies should include improving the online shopping experience, offering online-exclusive discounts, or providing personalized product recommendations based on their browsing history.