Market Segmentation: Number of Segments = Number Yielding Value to Product Management

# attach faithful data set
data(faithful)
plot(faithful, pch = "+")

plot of chunk unnamed-chunk-1


# run mclust on faithful data
require(mclust)
## Loading required package: mclust
## Package 'mclust' version 4.2
faithfulMclust <- Mclust(faithful, G = 2)
summary(faithfulMclust, parameters = TRUE)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm 
## ----------------------------------------------------
## 
## Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 2 components:
## 
##  log.likelihood   n df   BIC   ICL
##           -1130 272 11 -2322 -2323
## 
## Clustering table:
##   1   2 
## 175  97 
## 
## Mixing probabilities:
##      1      2 
## 0.6441 0.3559 
## 
## Means:
##            [,1]   [,2]
## eruptions  4.29  2.037
## waiting   79.97 54.480
## 
## Variances:
## [,,1]
##           eruptions waiting
## eruptions    0.1698   0.938
## waiting      0.9380  36.017
## [,,2]
##           eruptions waiting
## eruptions   0.06931  0.4367
## waiting     0.43669 33.7078
plot(faithfulMclust)

plot of chunk unnamed-chunk-1 plot of chunk unnamed-chunk-1 plot of chunk unnamed-chunk-1 plot of chunk unnamed-chunk-1

Unlike the Old Faithful data with well-separated grouping of data points, product quality-price trade-offs look more like the following plot of 300 consumers indicating how much they are willing to spend and what they expect to get for their money (i.e., product quality is a composite index combining desired features and services).

# create 3 segment data set
require(MASS)
## Loading required package: MASS
sigma <- matrix(c(1, 0.6, 0.6, 1), 2, 2)
mean1 <- c(-1, -1)
mean2 <- c(0, 0)
mean3 <- c(1, 1)
set.seed(3202014)
mydata1 <- mvrnorm(n = 100, mean1, sigma)
mydata2 <- mvrnorm(n = 100, mean2, sigma)
mydata3 <- mvrnorm(n = 100, mean3, sigma)
mydata <- rbind(mydata1, mydata2, mydata3)
colnames(mydata) <- c("Desired Level of Quality", "Willingness to Pay")
plot(mydata, pch = "+")

plot of chunk unnamed-chunk-2

There is a strong positive relationship between demand for quality and willingness to pay, so a product manager might well decide that there was opportunity for at least a high-end and a low-end option. However, there is no natural breaks in the scatterplot. Thus, if this data cloud is a mixture of distinct distributions, then these distributions must be overlapping.

As Wedel and Kamakura note, “Segments are not homogeneous groupings of customers naturally occurring in the marketplace, but are determined by the marketing manager's strategic view of the market.” That is, one could look at the above scatterplot, see the presence of a strong first principal component from the closest of the data points to the principal axis of the ellipse and argue that customer heterogeneity is a single continuous dimension running from the low- to the high-end of the product spectrum. Or, one could look at the same scatterplot and see three overlapping segments seeking basic, value and premium products (good, better, best). Let's run mclust and learn if we can find our three segments.

# run Mclust with 3 segments
mydataClust <- Mclust(mydata, G = 3)
summary(mydataClust, parameters = TRUE)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm 
## ----------------------------------------------------
## 
## Mclust EEE (elliposidal, equal volume, shape and orientation) model with 3 components:
## 
##  log.likelihood   n df   BIC   ICL
##          -858.5 300 11 -1780 -1924
## 
## Clustering table:
##   1   2   3 
##  75 137  88 
## 
## Mixing probabilities:
##      1      2      3 
## 0.2810 0.4212 0.2978 
## 
## Means:
##                            [,1]     [,2]  [,3]
## Desired Level of Quality -1.433 -0.09978 1.431
## Willingness to Pay       -1.362 -0.12692 1.286
## 
## Variances:
## [,,1]
##                          Desired Level of Quality Willingness to Pay
## Desired Level of Quality                   0.6337             0.2352
## Willingness to Pay                         0.2352             0.5603
## [,,2]
##                          Desired Level of Quality Willingness to Pay
## Desired Level of Quality                   0.6337             0.2352
## Willingness to Pay                         0.2352             0.5603
## [,,3]
##                          Desired Level of Quality Willingness to Pay
## Desired Level of Quality                   0.6337             0.2352
## Willingness to Pay                         0.2352             0.5603
plot(mydataClust)

plot of chunk unnamed-chunk-3 plot of chunk unnamed-chunk-3 plot of chunk unnamed-chunk-3 plot of chunk unnamed-chunk-3

When instructed to look for three clusters, the mclust function returned the above result. The 300 observed points are represented as a mixture of three distributions falling along the principal axis of the larger ellipse formed by all the observations.

However, if I had not specified three clusters and asked mclust to use its default BIC criterion to select the number of segments, I would have been told that there was no compelling evidence for more than one homogeneous group. Without any prior specification Mclust would have returned a single homogeneous distribution, although as you can see from the R code below, my 300 observations were a mixture of three equal size distributions falling along the principal axis and separated by one standard deviation.

# let Mclust decide on number of segments
mydataClust <- Mclust(mydata)
## Warning: best model occurs at the min or max # of components considered
## Warning: optimal number of clusters occurs at min choice
summary(mydataClust, parameters = TRUE)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm 
## ----------------------------------------------------
## 
## Mclust XXX (elliposidal multivariate normal) model with 1 component:
## 
##  log.likelihood   n df   BIC   ICL
##          -864.8 300  5 -1758 -1758
## 
## Clustering table:
##   1 
## 300 
## 
## Mixing probabilities:
## 1 
## 1 
## 
## Means:
##                              [,1]
## Desired Level of Quality -0.01851
## Willingness to Pay       -0.05310
## 
## Variances:
## [,,1]
##                          Desired Level of Quality Willingness to Pay
## Desired Level of Quality                    1.824              1.336
## Willingness to Pay                          1.336              1.578

The shoe manufacturer can get along with three sizes of sandals but not three sizes of dress shoes. It is not the foot that is changing, but the demands of the customer. Thus, even if segments are no more than convenient fictions, they can be useful from the manager's perspective.