Introduction

Customer Personality analysis is a detailed analysis done by a brand or company to find an ideal set of customers. This helps the company to make changes in their product and marketing in a more efficient way. Promoting a product to a customer who is more likely to buy the product is better than promoting it to a random person.

Customer Personality analysis is done by clustering as it is a un-supervised problem.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(cluster)
library(gower)
library(Rtsne)
library(stringr)
library(corrgram)

Data

data <- read.csv2("/Users/aashaysharma/Desktop/DS/Customer Personality Analysis/marketing_campaign.csv", sep = "\t")

sum(is.na(data))

## [1] 24

data <- na.omit(data)

As there are very few NA items we will just omit them.

str(data)

## 'data.frame':    2216 obs. of  29 variables:
##  $ ID                 : int  5524 2174 4141 6182 5324 7446 965 6177 4855 5899 ...
##  $ Year_Birth         : int  1957 1954 1965 1984 1981 1967 1971 1985 1974 1950 ...
##  $ Education          : chr  "Graduation" "Graduation" "Graduation" "Graduation" ...
##  $ Marital_Status     : chr  "Single" "Single" "Together" "Together" ...
##  $ Income             : int  58138 46344 71613 26646 58293 62513 55635 33454 30351 5648 ...
##  $ Kidhome            : int  0 1 0 1 1 0 0 1 1 1 ...
##  $ Teenhome           : int  0 1 0 0 0 1 1 0 0 1 ...
##  $ Dt_Customer        : chr  "04-09-2012" "08-03-2014" "21-08-2013" "10-02-2014" ...
##  $ Recency            : int  58 38 26 26 94 16 34 32 19 68 ...
##  $ MntWines           : int  635 11 426 11 173 520 235 76 14 28 ...
##  $ MntFruits          : int  88 1 49 4 43 42 65 10 0 0 ...
##  $ MntMeatProducts    : int  546 6 127 20 118 98 164 56 24 6 ...
##  $ MntFishProducts    : int  172 2 111 10 46 0 50 3 3 1 ...
##  $ MntSweetProducts   : int  88 1 21 3 27 42 49 1 3 1 ...
##  $ MntGoldProds       : int  88 6 42 5 15 14 27 23 2 13 ...
##  $ NumDealsPurchases  : int  3 2 1 2 5 2 4 2 1 1 ...
##  $ NumWebPurchases    : int  8 1 8 2 5 6 7 4 3 1 ...
##  $ NumCatalogPurchases: int  10 1 2 0 3 4 3 0 0 0 ...
##  $ NumStorePurchases  : int  4 2 10 4 6 10 7 4 2 0 ...
##  $ NumWebVisitsMonth  : int  7 5 4 6 5 6 6 8 9 20 ...
##  $ AcceptedCmp3       : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ AcceptedCmp4       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp5       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp2       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Complain           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_CostContact      : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Z_Revenue          : int  11 11 11 11 11 11 11 11 11 11 ...
##  $ Response           : int  1 0 0 0 0 0 0 0 1 0 ...
##  - attr(*, "na.action")= 'omit' Named int [1:24] 11 28 44 49 59 72 91 92 93 129 ...
##   ..- attr(*, "names")= chr [1:24] "11" "28" "44" "49" ...

The data is mixed type data (ie both Categorical and Continous values). Euclidean distance is commonly used for clustering algorithms like KMeans and hierarchical clustering but it is useful when we only have to deal with continous values. Thus here we will use gower distance as the distance metric.

Gower distance is computed as the average of partial dissimilarities across individuals. Each partial dissimilarity (and thus Gower distance) ranges in [0 1].

Data Cleaning

To work with the data we need to alter some variables and add a few new variables based on the existing ones.

Year_Birth Changing it to the Age variable

data$Age <- 2014 - data$Year_Birth

Education It has too many categories this can be converted to UG and PG categories.

data$Education[data$Education == "2n Cycle"] = "UG"
data$Education[data$Education == "Basic"] = "UG"
data$Education[data$Education == "Graduation"] = "PG"
data$Education[data$Education == "Master"] = "PG"
data$Education[data$Education == "PhD"] = "PG"

Marital_Status It also has many categories which can be converted to Single and Couple categories.

data$Marital_Status[data$Marital_Status == "Divorced"] = "Single"
data$Marital_Status[data$Marital_Status == "Absurd"] = "Single"
data$Marital_Status[data$Marital_Status == "YOLO"] = "Single"
data$Marital_Status[data$Marital_Status == "Widow"] = "Single"
data$Marital_Status[data$Marital_Status == "Together"] = "Couple"
data$Marital_Status[data$Marital_Status == "Married"] = "Couple"
data$Marital_Status[data$Marital_Status == "Alone"] = "Single"

Dt_Customer The customer association date with the company can be changed to a seniority level.

data$Customer_year <- str_sub(data$Dt_Customer,-4)
data$Customer_year <- as.numeric(data$Customer_year)
data$Customer_Seniority <- 2014 - data$Customer_year

Kid home and Teen Home I feel that the keed home and teen home variables can collectively define the child variable.

data$Child <- data$Kidhome + data$Teenhome

Mnt products The amount spend of products can be added together to get the total amount spent which will be a more useful variable.

data$Amt_Spent <- data$MntWines + data$MntFishProducts + data$MntFruits + data$MntGoldProds + data$MntMeatProducts + data$MntSweetProducts

Purchases made We can get the total number of purchases by adding the different purchases variables.

data$Num_Purchases_made <- data$NumWebPurchases + data$NumCatalogPurchases + data$NumStorePurchases

Thus we are left with 12 variables that can be used to create the distance matrix.

data <- data[c(1,30,3,4,5,33,32,9,34,35,16,20)]

EDA

Corrgram :

data2 <- data
data2$Education <- unclass(as.factor(data2$Education))
data2$Marital_Status <- unclass(as.factor(data2$Marital_Status))
data2$Education <- as.numeric(data2$Education)
data2$Marital_Status <- as.numeric(data2$Marital_Status)
corrgram(data2[-c(1)], order=TRUE, lower.panel=panel.shade, upper.panel=NULL, text.panel=panel.txt, main="Customer Data")

Strong Correlation between Income, Amount Spent and Purchase, As well as there is slight trend between Education and Income(PG:1, UG:2) so this tells us that people with higher studies earn more. 1. Age

age_plot <- ggplot(data = data, aes(Age))
age_plot + geom_density()

Education

edu_plot1 <- ggplot(data = data, aes(Education))
edu_plot1 + geom_histogram(stat = "count")

edu_plot2<- ggplot(data = data, aes(Age, fill = Education))
edu_plot2 + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Marital_Status

marital_plot <- ggplot(data = data, aes(Marital_Status, fill = Marital_Status))
marital_plot + geom_histogram(stat = "count")

Income & Amount Spent

require(scales)

## Loading required package: scales

inc_plt <- ggplot(data = data, aes(Income, Amt_Spent))
inc_plt + geom_point(alpha = 0.5, color = "blue") + scale_x_continuous(labels = comma)

Child

a <- ggplot(data = data,  aes(Child, fill = Marital_Status))
a + geom_histogram(position = "dodge")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Customer Seniority & Amount Spent

cust_plt <- ggplot(data = data, aes(as.factor(Customer_Seniority), Amt_Spent, fill = Customer_Seniority))

cust_plt + geom_boxplot(color = "black")

Customer Analysis :

Analysing customers based on their income, seniority, age and amount spent.

gower_dist2 <- daisy(data[c("Income", "Amt_Spent", "Customer_Seniority", "Age")],
                    metric = "gower")

summary(gower_dist2)

## 2454220 dissimilarities, summarized :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1234  0.1902  0.1973  0.2686  0.7065 
## Metric :  mixed ;  Types = I, I, I, I 
## Number of objects : 2216

gower_mat2 <- as.matrix(gower_dist2)

MOST SIMILAR PAIR

data[
  which(gower_mat2 == min(gower_mat2[gower_mat2 != min(gower_mat2)]),
        arr.ind = TRUE)[1, ], ]

##         ID Age Education Marital_Status Income Child Customer_Seniority Recency
## 1754  5092  65        PG         Single  51569     1                  1      39
## 438  10664  65        PG         Single  51529     1                  1      14
##      Amt_Spent Num_Purchases_made NumDealsPurchases NumWebVisitsMonth
## 1754       467                 15                 4                 8
## 438        467                 15                 2                 8

MOST DISSIMILAR PAIR

data[
  which(gower_mat2 == max(gower_mat2[gower_mat2 != max(gower_mat2)]),
        arr.ind = TRUE)[1, ], ]

##        ID Age Education Marital_Status Income Child Customer_Seniority Recency
## 915 10619  20        PG         Single  95529     0                  2      29
## 240 11004 121        UG         Single  60182     1                  0      23
##     Amt_Spent Num_Purchases_made NumDealsPurchases NumWebVisitsMonth
## 915      1990                 17                 1                 3
## 240        22                  3                 1                 4

FINDING NUMBER OF CLUSTERS

sil_width2 <- c(NA)

for(i in 2:10){
  
  pam_fit2 <- pam(gower_dist2,
                 diss = TRUE,
                 k = i)
  
  sil_width2[i] <- pam_fit2$silinfo$avg.width
  
}

plot(1:10, sil_width2,
     xlab = "Number of clusters",
     ylab = "Silhouette Width")
lines(1:10, sil_width2)

By silhouette method we can choose 4 clusters.

CLUSTER SUMMARY

pam_fit2 <- pam(gower_dist2, diss = TRUE, k = 4)

pam_results2 <- data %>%
  dplyr::select(-c(ID, Child, NumWebVisitsMonth, Num_Purchases_made, NumDealsPurchases, Education, Marital_Status)) %>%
  mutate(cluster = pam_fit2$clustering) %>%
  group_by(cluster) %>%
  do(the_summary = summary(.))

pam_results2$the_summary

## [[1]]
##       Age            Income       Customer_Seniority    Recency     
##  Min.   :18.00   Min.   :  6835   Min.   :2          Min.   : 0.00  
##  1st Qu.:36.00   1st Qu.: 34217   1st Qu.:2          1st Qu.:26.00  
##  Median :43.00   Median : 52050   Median :2          Median :50.50  
##  Mean   :44.17   Mean   : 51150   Mean   :2          Mean   :50.11  
##  3rd Qu.:54.00   3rd Qu.: 67441   3rd Qu.:2          3rd Qu.:76.00  
##  Max.   :69.00   Max.   :160803   Max.   :2          Max.   :99.00  
##    Amt_Spent         cluster 
##  Min.   :  13.0   Min.   :1  
##  1st Qu.: 132.2   1st Qu.:1  
##  Median : 578.5   Median :1  
##  Mean   : 746.6   Mean   :1  
##  3rd Qu.:1240.8   3rd Qu.:1  
##  Max.   :2352.0   Max.   :1  
## 
## [[2]]
##       Age             Income       Customer_Seniority    Recency     
##  Min.   : 18.00   Min.   :  1730   Min.   :0          Min.   : 0.00  
##  1st Qu.: 37.00   1st Qu.: 35246   1st Qu.:0          1st Qu.:23.00  
##  Median : 44.00   Median : 51287   Median :0          Median :47.00  
##  Mean   : 45.18   Mean   : 52827   Mean   :0          Mean   :47.83  
##  3rd Qu.: 55.00   3rd Qu.: 70545   3rd Qu.:0          3rd Qu.:73.00  
##  Max.   :121.00   Max.   :157243   Max.   :0          Max.   :99.00  
##    Amt_Spent         cluster 
##  Min.   :   6.0   Min.   :2  
##  1st Qu.:  49.0   1st Qu.:2  
##  Median : 165.0   Median :2  
##  Mean   : 493.8   Mean   :2  
##  3rd Qu.: 889.0   3rd Qu.:2  
##  Max.   :2525.0   Max.   :2  
## 
## [[3]]
##       Age             Income       Customer_Seniority    Recency     
##  Min.   : 21.00   Min.   :  2447   Min.   :1          Min.   : 0.00  
##  1st Qu.: 39.00   1st Qu.: 62745   1st Qu.:1          1st Qu.:24.00  
##  Median : 49.00   Median : 70379   Median :1          Median :51.00  
##  Mean   : 48.25   Mean   : 70493   Mean   :1          Mean   :49.63  
##  3rd Qu.: 59.00   3rd Qu.: 78353   3rd Qu.:1          3rd Qu.:74.00  
##  Max.   :115.00   Max.   :157146   Max.   :1          Max.   :99.00  
##    Amt_Spent       cluster 
##  Min.   : 507   Min.   :3  
##  1st Qu.: 882   1st Qu.:3  
##  Median :1128   Median :3  
##  Mean   :1200   Mean   :3  
##  3rd Qu.:1493   3rd Qu.:3  
##  Max.   :2524   Max.   :3  
## 
## [[4]]
##       Age             Income       Customer_Seniority    Recency     
##  Min.   : 22.00   Min.   :  3502   Min.   :1          Min.   : 0.00  
##  1st Qu.: 36.00   1st Qu.: 28437   1st Qu.:1          1st Qu.:25.00  
##  Median : 42.00   Median : 38199   Median :1          Median :49.00  
##  Mean   : 43.69   Mean   : 39338   Mean   :1          Mean   :48.74  
##  3rd Qu.: 51.00   3rd Qu.: 47496   3rd Qu.:1          3rd Qu.:74.00  
##  Max.   :114.00   Max.   :666666   Max.   :1          Max.   :99.00  
##    Amt_Spent        cluster 
##  Min.   :  5.0   Min.   :4  
##  1st Qu.: 46.0   1st Qu.:4  
##  Median : 91.0   Median :4  
##  Mean   :168.4   Mean   :4  
##  3rd Qu.:268.2   3rd Qu.:4  
##  Max.   :662.0   Max.   :4

We can interpret from the cluster summary that 4 types of customers here are :

1. Cluster 1 : Avg Income | Senior Customers | Spending Avg Amount

2. Cluster 2 : Avg Income | New Customers | Avg spending less than cluster 1 customers

3. Cluster 3 : High Income | Fairly Senior Customers | High Avg Spending

4. Cluster 4 : Below Avg Income | Fairly Senior Customers | Low Average Spending

VISUALISING CLUSTERS

tsne_obj2 <- Rtsne(gower_dist2, is_distance = TRUE)

tsne_data2 <- tsne_obj2$Y %>%
  data.frame() %>%
  setNames(c("X", "Y")) %>%
  mutate(cluster = factor(pam_fit2$clustering),
         name = data$ID)

ggplot(aes(x = X, y = Y), data = tsne_data2) +
  geom_point(aes(color = cluster))

Analysing customers based on Item Purchased, Amount Spent, Number of Child and Marital Status.

data2 <- data

data2$Education <- unclass(as.factor(data2$Education))

data2$Marital_Status <- unclass(as.factor(data2$Marital_Status))

data2$Education <- as.numeric(data2$Education)

data2$Marital_Status <- as.numeric(data2$Marital_Status)

data_new <- read.csv2("/Users/aashaysharma/Desktop/DS/Customer Personality Analysis/marketing_campaign.csv", sep = "\t")
data_new <- na.omit(data_new)

data3 <- data2
data3$AmtWines <- data_new$MntWines
data3$AmtFruits <- data_new$MntFruits
data3$AmtFish <- data_new$MntFishProducts
data3$AmtMeat <- data_new$MntMeatProducts
data3$AmtSweet <- data_new$MntSweetProducts
data3$AmtGold <- data_new$MntGoldProds
data3$Education <- as.factor(data3$Education)
data3$Marital_Status <- as.factor(data3$Marital_Status)

gower_dist3 <- daisy(data3[c("Income", "Child", "Num_Purchases_made", "Marital_Status", "AmtWines", "AmtFruits", "AmtFish", "AmtMeat", "AmtSweet", "AmtGold", "NumDealsPurchases", "Amt_Spent", "Recency")],
                    metric = "gower")

summary(gower_dist3)

## 2454220 dissimilarities, summarized :
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1422  0.2075  0.2135  0.2796  0.6256 
## Metric :  mixed ;  Types = I, I, I, N, I, I, I, I, I, I, I, I, I 
## Number of objects : 2216

gower_mat3 <- as.matrix(gower_dist3)

MOST SIMILAR PAIR

data3[
  which(gower_mat3 == min(gower_mat3[gower_mat3 != min(gower_mat3)]),
        arr.ind = TRUE)[1, ], ]

##        ID Age Education Marital_Status Income Child Customer_Seniority Recency
## 1469 7196  64         1              1  41145     2                  0      20
## 570  3525  44         1              1  38200     2                  2      19
##      Amt_Spent Num_Purchases_made NumDealsPurchases NumWebVisitsMonth AmtWines
## 1469        13                  3                 1                 3        9
## 570         17                  3                 1                 7       12
##      AmtFruits AmtFish AmtMeat AmtSweet AmtGold
## 1469         0       0       3        0       1
## 570          0       0       4        0       1

MOST DISSIMILAR PAIR

data3[
  which(gower_mat3 == max(gower_mat3[gower_mat3 != max(gower_mat3)]),
        arr.ind = TRUE)[1, ], ]

##        ID Age Education Marital_Status Income Child Customer_Seniority Recency
## 1445 1553  68         1              1  82657     0                  1      71
## 161  2795  56         1              2  30523     3                  1       0
##      Amt_Spent Num_Purchases_made NumDealsPurchases NumWebVisitsMonth AmtWines
## 1445      2283                 22                 1                 4      966
## 161         13                  3                 1                 7        5
##      AmtFruits AmtFish AmtMeat AmtSweet AmtGold
## 1445       168     246     672      105     126
## 161          0       0       3        0       5

FINDING NUMBER OF CLUSTERS

sil_width3 <- c(NA)

for(i in 2:10){
  
  pam_fit3 <- pam(gower_dist3,
                 diss = TRUE,
                 k = i)
  
  sil_width3[i] <- pam_fit3$silinfo$avg.width
  
}

plot(1:10, sil_width3,
     xlab = "Number of clusters",
     ylab = "Silhouette Width")
lines(1:10, sil_width3)

By silhouette method we can choose 4 clusters.

CLUSTER SUMMARY

pam_fit3 <- pam(gower_dist3, diss = TRUE, k = 3)

pam_results3 <- data3 %>%
  dplyr::select(-c(ID, NumWebVisitsMonth, Age, Customer_Seniority)) %>%
  mutate(cluster = pam_fit3$clustering) %>%
  group_by(cluster) %>%
  do(the_summary = summary(.))

pam_results3$the_summary

## [[1]]
##  Education Marital_Status     Income           Child           Recency     
##  1:563     1:433          Min.   : 32632   Min.   :0.0000   Min.   : 0.00  
##  2: 60     2:190          1st Qu.: 68109   1st Qu.:0.0000   1st Qu.:25.00  
##                           Median : 75012   Median :0.0000   Median :52.00  
##                           Mean   : 74363   Mean   :0.3451   Mean   :49.67  
##                           3rd Qu.: 81187   3rd Qu.:1.0000   3rd Qu.:74.00  
##                           Max.   :105471   Max.   :3.0000   Max.   :99.00  
##    Amt_Spent    Num_Purchases_made NumDealsPurchases    AmtWines     
##  Min.   : 461   Min.   : 9.00      Min.   : 0.00     Min.   :  47.0  
##  1st Qu.:1072   1st Qu.:17.00      1st Qu.: 1.00     1st Qu.: 397.5  
##  Median :1338   Median :20.00      Median : 1.00     Median : 587.0  
##  Mean   :1381   Mean   :19.98      Mean   : 1.66     Mean   : 639.3  
##  3rd Qu.:1653   3rd Qu.:23.00      3rd Qu.: 2.00     3rd Qu.: 854.0  
##  Max.   :2525   Max.   :32.00      Max.   :15.00     Max.   :1493.0  
##    AmtFruits        AmtFish          AmtMeat         AmtSweet     
##  Min.   :  0.0   Min.   :  0.00   Min.   : 48.0   Min.   :  0.00  
##  1st Qu.: 27.0   1st Qu.: 45.00   1st Qu.:242.0   1st Qu.: 30.00  
##  Median : 58.0   Median : 86.00   Median :397.0   Median : 56.00  
##  Mean   : 68.3   Mean   : 98.18   Mean   :425.5   Mean   : 69.32  
##  3rd Qu.:102.0   3rd Qu.:145.00   3rd Qu.:568.5   3rd Qu.:102.00  
##  Max.   :199.0   Max.   :259.00   Max.   :984.0   Max.   :198.00  
##     AmtGold          cluster 
##  Min.   :  0.00   Min.   :1  
##  1st Qu.: 33.00   1st Qu.:1  
##  Median : 61.00   Median :1  
##  Mean   : 80.41   Mean   :1  
##  3rd Qu.:118.50   3rd Qu.:1  
##  Max.   :249.00   Max.   :1  
## 
## [[2]]
##  Education Marital_Status     Income           Child          Recency     
##  1:529     1:  0          Min.   :  1730   Min.   :0.000   Min.   : 0.00  
##  2: 67     2:596          1st Qu.: 31630   1st Qu.:1.000   1st Qu.:25.00  
##                           Median : 43102   Median :1.000   Median :49.00  
##                           Mean   : 44460   Mean   :1.117   Mean   :48.13  
##                           3rd Qu.: 57102   3rd Qu.:2.000   3rd Qu.:73.00  
##                           Max.   :153924   Max.   :3.000   Max.   :99.00  
##    Amt_Spent      Num_Purchases_made NumDealsPurchases    AmtWines     
##  Min.   :   5.0   Min.   : 0.00      Min.   : 0.00     Min.   :   0.0  
##  1st Qu.:  54.0   1st Qu.: 5.00      1st Qu.: 1.00     1st Qu.:  15.0  
##  Median : 166.0   Median : 8.00      Median : 2.00     Median :  68.0  
##  Mean   : 351.8   Mean   :10.31      Mean   : 2.49     Mean   : 198.3  
##  3rd Qu.: 577.2   3rd Qu.:15.00      3rd Qu.: 3.00     3rd Qu.: 273.0  
##  Max.   :1766.0   Max.   :31.00      Max.   :15.00     Max.   :1462.0  
##    AmtFruits        AmtFish          AmtMeat          AmtSweet     
##  Min.   :  0.0   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  1.0   1st Qu.:  2.00   1st Qu.: 11.00   1st Qu.:  1.00  
##  Median :  5.0   Median :  7.00   Median : 32.00   Median :  4.00  
##  Mean   : 12.9   Mean   : 17.28   Mean   : 77.25   Mean   : 12.56  
##  3rd Qu.: 15.0   3rd Qu.: 19.00   3rd Qu.:100.00   3rd Qu.: 14.00  
##  Max.   :174.0   Max.   :201.00   Max.   :873.00   Max.   :262.00  
##     AmtGold          cluster 
##  Min.   :  0.00   Min.   :2  
##  1st Qu.:  7.00   1st Qu.:2  
##  Median : 18.00   Median :2  
##  Mean   : 33.53   Mean   :2  
##  3rd Qu.: 40.00   3rd Qu.:2  
##  Max.   :291.00   Max.   :2  
## 
## [[3]]
##  Education Marital_Status     Income           Child          Recency     
##  1:870     1:997          Min.   :  2447   Min.   :0.000   Min.   : 0.00  
##  2:127     2:  0          1st Qu.: 30261   1st Qu.:1.000   1st Qu.:24.00  
##                           Median : 41145   Median :1.000   Median :49.00  
##                           Mean   : 43083   Mean   :1.222   Mean   :49.13  
##                           3rd Qu.: 54198   3rd Qu.:2.000   3rd Qu.:74.00  
##                           Max.   :666666   Max.   :3.000   Max.   :99.00  
##    Amt_Spent    Num_Purchases_made NumDealsPurchases    AmtWines     
##  Min.   :   8   Min.   : 0.000     Min.   : 0.000    Min.   :   0.0  
##  1st Qu.:  50   1st Qu.: 5.000     1st Qu.: 1.000    1st Qu.:  13.0  
##  Median : 127   Median : 7.000     Median : 2.000    Median :  48.0  
##  Mean   : 276   Mean   : 9.266     Mean   : 2.639    Mean   : 160.1  
##  3rd Qu.: 428   3rd Qu.:13.000     3rd Qu.: 3.000    3rd Qu.: 216.0  
##  Max.   :1730   Max.   :29.000     Max.   :15.000    Max.   :1279.0  
##    AmtFruits          AmtFish          AmtMeat           AmtSweet     
##  Min.   :  0.000   Min.   :  0.00   Min.   :   1.00   Min.   :  0.00  
##  1st Qu.:  1.000   1st Qu.:  2.00   1st Qu.:  11.00   1st Qu.:  1.00  
##  Median :  4.000   Median :  6.00   Median :  25.00   Median :  4.00  
##  Mean   :  8.186   Mean   : 11.98   Mean   :  59.11   Mean   :  9.25  
##  3rd Qu.: 10.000   3rd Qu.: 15.00   3rd Qu.:  73.00   3rd Qu.: 11.00  
##  Max.   :123.000   Max.   :179.00   Max.   :1725.00   Max.   :195.00  
##     AmtGold          cluster 
##  Min.   :  0.00   Min.   :3  
##  1st Qu.:  6.00   1st Qu.:3  
##  Median : 14.00   Median :3  
##  Mean   : 27.43   Mean   :3  
##  3rd Qu.: 32.00   3rd Qu.:3  
##  Max.   :321.00   Max.   :3

We can interpret from the cluster summary that 3 types of customers here are :

1. Cluster 1 : Couple or Not Single people | More post graduates | Higher than average income | Highest Buyer of Wine & Meat | Gold purchase also above avg | Overall Purchase and Amount Spent also Highest

2. Cluster 2 : Singles | Post Graduates | Average Income | Below Avg Amount Spent | Greater Deal Purchases than cluster one customers

3. Cluster 3 : Couples | Mostly Post Graduates (But also includes relatively higher number of Undergrads compared to other clusters) | Below Avg Income | Low Amount Spent | More Deal focused Purchases | Would prefer essentials over lesuire (Gold and Sweets)

VISUALISING CLUSTERS

tsne_obj3 <- Rtsne(gower_dist3, is_distance = TRUE)

tsne_data3 <- tsne_obj3$Y %>%
  data.frame() %>%
  setNames(c("X", "Y")) %>%
  mutate(cluster = factor(pam_fit3$clustering),
         name = data3$ID)

ggplot(aes(x = X, y = Y), data = tsne_data3) +
  geom_point(aes(color = cluster))

# General Conclusions :

1. New & Old Customers with Avg Income and fairly avg amount spent should be focused on more, Better advertising and deals should be provided to them.

2. If a new Non discounted expensive item will be up for sale ads should be targeted better to customers with high spending nature & income.

3. Customers with low spending natures and low incomes should be targeted with flash deals and discounts on essentials like fish, fruits & meat to keep them connected with the company.

4. Customers tend to spend & purchase more on wine and meat so this products can be worked upon (Like more deals, better more variable products etc) to benefit the company.

5. Senior Customers should be connected with company in some or other ways like memberships or specific deals to them this will increase their relationship with company.

Customer Personality Analysis

Aashay Sharma

16/11/2021