Market Segmentation

Market segmentation is one of the most valued marketing techniques and it has the potential to help definining a corporate strategy and can be useful as a technique to leverage top level business decisions. However, carrying out an adequate market segmentation has proven to be difficult for small and medium sized companies from the perspective of both the data supply and also the available techniques and local know-how.

With the advent of a society and business enviroment that thrives on data and information, one of the two big challenges to develop a correct market segmentations is slowly being eroded. Currently, organizations of any size have the technology and capacity to gather, collect, clasify and store immense amounts of data about their customers, their iteractions with the company and their historical transactions. While the data barrier has been slowy eroding, still the capacities to use and benefit from that data are scarced. The reasons are multiple and fall beyond the scope of this article, but at the moment is possible to say that competition for talented data scientist and information analyst is fierce, and the best resources usually are assigned to the companies with larger financial proweness, making it difficult for smaller companies to compete for those resources.

In this paper, a strategy to segment customers for a publicly available data set will be presented.

The segmentation of a base of customers into multiple categories can contribute to a better understanding of that market and can allow to develop multiple aspects of the market strategy: relational marketing, identification of different communication and promotional strategies, product and sales diferentiation. A better understanding and knowledge of the market can have the final results of increasing customer loyalty and customer satisfaction, allowing market growth and better long term results

The “bank data set” is a publicly available collection of records for a hipotetic sample from a larger customer data set, who are clients of an un-identified Japanese large bank.

The data set contains 600 records (one for each customer) and 11 attributes or variables for each customer. The variables range from demographic data to attributes that identify the actual use of different bank products by those customers. The general characteristcs of the data set are described on the table bellow.

Descriptive Summary of the Bank Data

##        id           age            sex             region   
##  ID12101:  1   Min.   :18.00   FEMALE:300   INNER_CITY:269  
##  ID12102:  1   1st Qu.:30.00   MALE  :300   RURAL     : 96  
##  ID12103:  1   Median :42.00                SUBURBAN  : 62  
##  ID12104:  1   Mean   :42.40                TOWN      :173  
##  ID12105:  1   3rd Qu.:55.25                                
##  ID12106:  1   Max.   :67.00                                
##  (Other):594                                                
##      income      married      children      car      save_act  current_act
##  Min.   : 5014   NO :204   Min.   :0.000   NO :304   NO :186   NO :145    
##  1st Qu.:17265   YES:396   1st Qu.:0.000   YES:296   YES:414   YES:455    
##  Median :24925             Median :1.000                                  
##  Mean   :27524             Mean   :1.012                                  
##  3rd Qu.:36173             3rd Qu.:2.000                                  
##  Max.   :63130             Max.   :3.000                                  
##                                                                           
##  mortgage   pep     
##  NO :391   NO :326  
##  YES:209   YES:274  
##                     
##                     
##                     
##                     
##

Summary MultiPlot of the Bank data

For this paper, R, an open data processing software will be used to carry out the data processing and marketing analytics.

The R package and almost all of the current statistical sofware for machine learning and data mining include different algortihms that can be used for market segmentation. The general technique of separating a whole market into different pieces that share a common sub-set of attributes and characteristics is known as clustering. Among the most popular algorithms and techniques are k-means clustering, k-medoids clustering, hierarchical clustering and density-based clustering. the specific characteristics and technicalities ofthose algorithms wonèt be discussed here but on the internet there is a large availability of literature about them.

The following part of the document presents the use of the k-means and k-medoids algorithms to cluster the bank data set.

## 'data.frame':    600 obs. of  10 variables:
##  $ age        : int  48 40 51 23 57 57 22 58 37 54 ...
##  $ sex        : Factor w/ 2 levels "FEMALE","MALE": 1 2 1 1 1 1 2 2 1 2 ...
##  $ region     : Factor w/ 4 levels "INNER_CITY","RURAL",..: 1 4 1 4 2 4 2 4 3 4 ...
##  $ income     : num  17546 30085 16575 20375 50576 ...
##  $ married    : Factor w/ 2 levels "NO","YES": 1 2 2 2 2 2 1 2 2 2 ...
##  $ children   : int  1 3 0 3 0 2 0 0 2 2 ...
##  $ car        : Factor w/ 2 levels "NO","YES": 1 2 2 1 1 1 1 2 2 2 ...
##  $ save_act   : Factor w/ 2 levels "NO","YES": 1 1 2 1 2 2 1 2 1 2 ...
##  $ current_act: Factor w/ 2 levels "NO","YES": 1 2 2 2 1 2 2 2 1 2 ...
##  $ mortgage   : Factor w/ 2 levels "NO","YES": 1 2 1 1 1 1 1 1 1 1 ...

## Warning: package 'cluster' was built under R version 3.2.5

## Medoids:
##       ID age sex region  income married children car save_act current_act
## [1,] 376  41   2      1 20866.3       2        0   2        2           2
## [2,] 501  39   1      1 27765.8       2        3   2        2           1
## [3,] 457  32   1      4 13267.6       2        0   2        2           2
## [4,]  75  64   1      1 52674.0       1        2   2        2           2
## [5,] 443  43   1      2 36281.0       2        0   2        2           2
## [6,] 447  52   1      4 43719.5       2        0   1        2           2
##      mortgage
## [1,]        1
## [2,]        1
## [3,]        2
## [4,]        1
## [5,]        1
## [6,]        1
## Clustering vector:
##   [1] 1 2 3 1 4 5 3 2 2 1 4 2 3 4 1 1 1 6 2 1 4 3 5 3 3 6 1 1 2 1 1 3 1 2 5
##  [36] 1 3 3 1 2 2 2 4 2 3 4 3 3 5 1 3 1 3 2 4 6 2 1 3 2 2 5 2 3 3 1 3 5 1 3
##  [71] 3 3 2 1 4 5 2 4 3 5 1 2 3 3 1 2 3 1 3 2 2 2 5 6 6 3 3 2 3 3 2 3 3 5 5
## [106] 5 6 3 3 1 5 2 5 5 6 6 1 3 1 6 2 4 1 5 6 5 2 6 3 1 1 6 3 5 4 2 3 5 6 4
## [141] 5 1 1 4 1 4 1 1 2 2 2 5 4 2 1 3 2 1 1 3 1 3 1 3 2 1 3 3 2 5 4 1 2 2 6
## [176] 4 5 2 3 5 1 4 5 1 2 5 1 5 5 1 4 3 4 3 5 1 1 4 3 1 2 6 3 1 3 3 4 3 5 1
## [211] 2 5 3 3 3 1 3 2 2 2 2 6 5 6 4 2 4 2 2 3 2 2 3 2 6 3 4 1 1 3 3 4 5 1 2
## [246] 5 3 1 1 1 4 5 3 1 3 1 2 1 1 3 1 2 1 6 1 1 5 2 5 3 4 4 3 2 2 3 2 1 2 5
## [281] 1 3 4 5 3 3 2 2 3 5 3 5 6 1 1 5 2 3 1 3 3 3 2 6 4 3 4 2 2 3 2 5 2 3 1
## [316] 6 1 5 1 1 5 1 3 1 2 3 5 1 4 1 4 3 2 1 1 2 2 5 5 1 2 4 4 1 1 6 5 6 5 1
## [351] 2 2 2 6 2 4 1 3 5 1 4 1 2 3 4 1 2 6 3 5 6 5 3 2 6 1 5 2 1 3 2 5 3 5 2
## [386] 5 3 2 3 3 6 1 5 1 2 1 2 5 4 5 1 1 5 1 3 5 6 1 1 3 1 3 4 3 6 1 3 1 2 1
## [421] 4 6 3 6 1 2 2 5 1 2 2 4 3 2 1 3 3 2 3 3 2 6 5 5 3 2 6 2 4 5 2 5 2 4 3
## [456] 3 3 4 3 6 1 2 1 3 1 2 1 3 3 5 6 2 1 2 1 3 2 2 1 6 3 6 5 3 1 4 1 3 3 1
## [491] 2 1 2 6 6 5 6 6 6 1 2 5 2 3 4 2 1 3 4 3 3 3 6 3 5 3 1 5 1 2 3 1 3 3 1
## [526] 1 3 3 2 6 4 2 2 2 3 3 5 4 5 3 1 5 3 4 1 5 3 3 1 4 3 5 6 5 5 3 1 2 2 2
## [561] 3 1 4 1 1 2 1 1 2 1 5 5 2 1 2 2 1 6 3 2 4 3 1 2 1 3 1 2 4 3 1 2 1 4 2
## [596] 6 3 3 3 2
## Objective function:
##    build     swap 
## 2244.399 2128.571 
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"

clustering results

The results for the clustering algorithm (medoid clustering using Partitioning Around Medoids (PAM) package) present statisics for each cluster including the centroids (mean vectors for each cluster) These are used to describe each cluster. When the PAM algorithm was submitted in R, a parameter to calcualte 6 clusters was selected).

The centroid for cluster 1 shows that this is a market segment of middle age (average age=41) male bank customers, mostly married, inner city residents, who have an average income of approx. $20,866. They are single with no children, posses a vehicle and are users of chequing and savings accounts but don’t carry a mortgage. The rest of the 5 other segments calculated by the model can be described in similar manner.

The last part of the table lists to which of the 6 calculated clusters each of the 600 customer has been assigned. i.e.: cusotmer # 1 has been identified as belonging to cluster #1 (males in inner city, single, who posses a vehicle, no children..) This array of values wil be used to present a graphical analysis of the cluster separations. ** http://rpubs.com/MauVas **

Market Segmentation

Mauricio Vasquez

August 24, 2016