In this tutorial, we will use the multi-dimensional scaling to learn the underlying attributes of toothpaste brands. The data can be loaded from the MSR package.

# load the MSR package
library(MSR)

# load the data 
data("dist_toothpaste")
dist_toothpaste
##            AquaFresh Crest Colgate Aim Gleem PlusWhite UltraBrite CloseUp
## AquaFresh          0     3       2   6     6         2          6       6
## Crest              3     0       4   5     6         5          6       6
## Colgate            2     1       0   6     7         5          2       4
## Aim                4     2       2   0     1         6          4       3
## Gleem              6     5       4   3     0         6          4       4
## PlusWhite          5     5       4   4     3         0          6       5
## UltraBrite         6     6       6   5     3         3          0       6
## CloseUp            6     6       6   6     2         3          2       0
## Pepsodent          6     6       6   6     2         2          1       2
## Sensodyne          7     6       4   6     4         5          5       4
##            Pepsodent Sensodyne
## AquaFresh          6         3
## Crest              6         2
## Colgate            3         5
## Aim                3         2
## Gleem              2         1
## PlusWhite          2         5
## UltraBrite         4         2
## CloseUp            3         4
## Pepsodent          0         5
## Sensodyne          5         0

The dist_toothpaste data frame is a distance matrix between the 10 toothpaste brands. As discussed in class, the distance matrix has three features.

  • The matrix has the same no. of rows and columns (\(10 \times 10\), each column/row being a brand). Each cell is the distance/difference between a pair of brands.
  • The matrix is symmetric, with the distance from, for example, Aqua Fresh to Crest the same as that from Crest to Aqua Fresh.
  • The own distance is zero. For example, there is no or zero difference from Colgate to itself.

1 Step 1: Selecting a MDS Procedure

Throughout our course, we will use the classic MDS procedure. Within R, you can use a function called cmdscale() to run the classic MDS procedure. The function cmdscale takes two key inputs:

  1. A distance matrix as explained above.
  2. The no. of dimensions that you are planning to use.

With our data, we use dist_toothpaste as the distance matrix and the no. of dimensions is specified by k = within the function. For example, if we decide to have 2 dimensions, we can code it like this: cmdscale(dist_toothpaste, k = 2). For more details of the function, please use ?cmdscale in your R command line.

2 Step 2: Determine the No. of Dimensions

To obtain the MDS outcome, you must specify the no. of dimensions. Like in Cluster Analysis, we will use the Elbow criterion as the statistical basis to determine the no. of dimensions. To apply the Elbow criterion, we must compare the statistical performance of MDS under different no. of dimensions.

As one principle of statistical analyses, we always minimize the information loss. In MDS, the observation is the distance matrix that is obtained by the aggregating the responses of the stated differences from many consumers. The prediction from MDS is the distance matrix that is calculated based on the dimensions/attributes. Given the observed and predicted distance matrix, we use a criterion called STRESS (standardized residual sum of square). The smaller the STRESS, the smaller the residual sum of square, the less information loss, and the better the MDS.

With the STESS calculated for different no. of dimensions, we can create an plot like Elbow plot to determine the no. of dimensions. This plot is called a scree plot in MDS. The X-axis of the plot is different no. of dimensions. As a rule of thumb, we usually set the max no. to a small value less than 10. So, we have the no. of dimensions equal to \(1, 2, 3, \cdots, 9\) etc. The Y-axis of the scree plot represents the STRESS values at different no. of dimensions.

For our practice, you can use the function scree_plot that is provided to you in the MSR package. The function scree_plot takes one input: the distance matrix. It produces a scree plot for you.

scree_plot(dist_toothpaste)

From the scree plot, we can determine the no. of dimensions by applying the Elbow criterion. A closer look at the scree plot shows that at no. of dimensions \(2\), we have the elbow point. This is because, we have a large decrease in STESS \(1 \rightarrow 2\), and a small decrease in STESS \(2 \rightarrow 3\).

Note that, in practice, we also use prior knowledge and understanding of brands to help us determine the no. of dimensions. The Elbow criterion is the statistical basis, but the practical usefulness of your choice always matters the most.

3 Step 3: Interpreting the Dimensions

With the no. of dimensions determined, we first run the MDS again to obtain the dimensions. Within R, we run the cmdscale with the no. of dimensions set to \(2\), the elbow point.

dim_toothpaste <- cmdscale(dist_toothpaste, k = 2)

# coerce dim_toothpaste into a data frame
# we can use as.data.frame function
# for more info., run "?as.data.frame" in your R command line
dim_toothpaste <- as.data.frame(dim_toothpaste)
dim_toothpaste
##                    V1         V2
## AquaFresh  -2.9981784 -1.0494401
## Crest      -3.1134074 -0.8771502
## Colgate    -2.7490241 -0.2234601
## Aim        -0.7030305 -2.3432308
## Gleem       1.7206795 -1.3680642
## PlusWhite   1.1223830 -2.2401590
## UltraBrite  2.4878988 -1.4446344
## CloseUp     2.8602529  0.2058835
## Pepsodent   2.7463213 -0.6949752
## Sensodyne   0.9399701  1.7481112

In the data, the variables V1-V2 represent the dimensions or underlying attributes of the toothpaste brands. Next, we try to interpret the dimensions and find out more what these underlying attributes are. As discussed in class, the interpretation always requires some extra information. For example, knowledge about the features of the brands. In practice, you may also do extra research, qualitative and/or quantitative, to find out more info.

For easy interpretation, we oftentimes produce the perceptual map as “visual aids”. To do this, you are given a function called perceptual_map in the MSR package. The function produces an 2D (X-Y) plot of two dimensions. You must specify two inputs:

  • A data frame of the dimensions of the brands (e.g., dim_toothpaste).
  • A numeric vector of which two dimensions to plot or which columns of the dimensions data frame. For example, if you want to plot Dimension 1 and 2 in dim_toothpaste, you input a vector of two values c(1,2).

Next, we will produce a perceptual map with Dimension 1 and 2, and discuss how to interpret the two dimensions with the perceptual map.

# To plot the perceptual map between Dimension 1 and 2
# we use the data frame of dimensions "dim_toothpaste"
# we also specify Dimension 1 and 2 with a vector "c(1,2)"
perceptual_map(dim_toothpaste, c(1,2))

From the perceptual map, we want to detect some patterns. For example, we can observe that Sensodyne is the brand that is highest in Dimension 2. So, Dimension 2 may be something unique about Sensodyne. The understanding of the uniqueness of Sensodyne comes from you, as brand managers. For example, if, as a brand manager, you already know what makes Sensodyne unique is the emphasis on “for sensitive teeth”. From this understanding, you may conclude that Dimension 2 is “for sensitive teeth”. By the same logic, CloseUp and Pepsodent are the two highest brands in Dimension 1. Then, you look for what makes these two brands distinctive from other brands. If, say, as a brand manager, you know that these two brands emphasize “omni-protection”, you can interpret Dimension 1 as “omni-protection”.

In practice, it may not be sufficient to rely only on your expertise as a brand manager. We either collect addition data during the MDS studies or carry out more research (e.g., focus groups or in-depth interview with expert users).

