Multi-dimensional Scaling (PCoA)

Introduction

Example on creating MDS and PCoA in R taken from Statquest YouTube video found https://www.youtube.com/watch?v=pGAUHhLYp5Q

Data

Generate some fake data

##        wt1  wt2 wt3  wt4 wt5 ko1 ko2 ko3 ko4 ko5
## gene1  440  448 389  426 459 386 408 349 406 390
## gene2 1015 1005 964 1013 935 865 873 849 877 829
## gene3  834  797 820  821 826 863 884 866 858 832
## gene4  778  751 789  725 767  25  21  23  21  21
## gene5  552  521 509  520 551 475 478 456 492 495
## gene6  378  386 368  326 356 769 752 734 757 749

PCA

PCA on dataset for comparison purposes

PCA Matrix

##  [1] 89.7  2.5  2.4  2.1  1.2  0.8  0.7  0.4  0.3  0.0
##     Sample         X           Y
## wt1    wt1 -9.110030  3.11841444
## wt2    wt2 -9.227306 -1.84826109
## wt3    wt3 -8.522961  0.60003868
## wt4    wt4 -8.941175  0.07983663
## wt5    wt5 -9.100339 -1.89664048
## ko1    ko1  9.094450  0.18292561
## ko2    ko2  9.137243 -1.02702485
## ko3    ko3  9.189941  1.50153310
## ko4    ko4  8.767896  0.65976131
## ko5    ko5  8.712280 -1.37058335

Plot

Create a PCA plot using ggplot()

the wild-type samples are on the left side of the graph and the knock-out samples are on the right side. The x-axis, for PC1, accounts for 91% of the variation in the data. The y-axis, PC2, only accounts for about 3% of the variation in the data. This means most of the difference is between the WT and KO samples

#{-}

MDS/PCoA Plot

Distance Matrix

Create a distance matrix using dist() function

We transposed the matrix so that the samples are rows and we centered and scaled the measurements for each gene. We informed the dist function that we want to scale using the Euclidean distance metric.

Multi-dimensional Scaling

Perform MDS on the distance matrix using cmdscale()

mds.stuff <- cmdscale(distance.matrix, eig=TRUE, x.ret=TRUE)

We inform the function that we want the eigen values returned. These are used to calculate how much variation in the distance matrix each axis in the final MDS plot accounts for.

Calculate variation amount

Calculate the amount of variation each axis in the MDS plot accounts for using the eigen values

##  [1] 89.7  2.5  2.4  2.1  1.2  0.8  0.7  0.4  0.3  0.0

Format and Plot Data

Just like the PCA graph, the wild-type samples are on the left side of the graph and the knock-out samples are on the right side. The x-axis accounts for 89.7% of the variation in the data and the y-axis only accounts for 3.1% of the variation in the data.

The PCA graph and the MDS graph are exactly the same. This is because we used the Euclidean metric to calculate the distance matrix. #{-}

Average of the Absolute Value of the Log Fold Change

Calculate distance using a different metric

This is what edgeR does when you call plotMDS() function

##           wt1       wt2       wt3       wt4      wt5        ko1        ko2
## wt1 0.0000000 0.0000000 0.0000000 0.0000000 0.000000 0.00000000 0.00000000
## wt2 0.1100538 0.0000000 0.0000000 0.0000000 0.000000 0.00000000 0.00000000
## wt3 0.1224255 0.1077286 0.0000000 0.0000000 0.000000 0.00000000 0.00000000
## wt4 0.1181092 0.1265903 0.1156071 0.0000000 0.000000 0.00000000 0.00000000
## wt5 0.1175227 0.1280052 0.1187170 0.1203655 0.000000 0.00000000 0.00000000
## ko1 1.6142631 1.6156341 1.6051714 1.6009046 1.631418 0.00000000 0.00000000
## ko2 1.6350644 1.6287639 1.6248675 1.6172340 1.648746 0.09143202 0.00000000
## ko3 1.6169218 1.6209147 1.6113178 1.6072603 1.639320 0.08925412 0.09340202
## ko4 1.6257177 1.6288917 1.6221043 1.6172762 1.649079 0.09326805 0.09137187
## ko5 1.6267860 1.6208664 1.6149647 1.6085468 1.639741 0.08296916 0.09199447
##            ko3        ko4 ko5
## wt1 0.00000000 0.00000000   0
## wt2 0.00000000 0.00000000   0
## wt3 0.00000000 0.00000000   0
## wt4 0.00000000 0.00000000   0
## wt5 0.00000000 0.00000000   0
## ko1 0.00000000 0.00000000   0
## ko2 0.00000000 0.00000000   0
## ko3 0.00000000 0.00000000   0
## ko4 0.08261177 0.00000000   0
## ko5 0.08526488 0.08372947   0

Perform MDS

##  [1] 99.3  0.3  0.2  0.1  0.1  0.1  0.0  0.0  0.0 -0.1

Format and Plot Data

##     Sample          X             Y
## wt1    wt1  0.8110533  0.0867784534
## wt2    wt2  0.8102102 -0.0253206718
## wt3    wt3  0.8030323 -0.0025730583
## wt4    wt4  0.7973662 -0.0261377731
## wt5    wt5  0.8288713 -0.0327738838
## ko1    ko1 -0.8006737  0.0003197392
## ko2    ko2 -0.8180405 -0.0465956111
## ko3    ko3 -0.8064127  0.0435146116
## ko4    ko4 -0.8158962  0.0481090188
## ko5    ko5 -0.8095105 -0.0453208249

While similar to the plot using Euclidean distance, the new graph is not the same. the x-axis accounts for more of the variation (99.4% vs. 89.7%).