Multi-dimensional Scaling (PCoA)
Introduction
Example on creating MDS and PCoA in R taken from Statquest YouTube video found https://www.youtube.com/watch?v=pGAUHhLYp5Q
Data
Generate some fake data
## wt1 wt2 wt3 wt4 wt5 ko1 ko2 ko3 ko4 ko5
## gene1 440 448 389 426 459 386 408 349 406 390
## gene2 1015 1005 964 1013 935 865 873 849 877 829
## gene3 834 797 820 821 826 863 884 866 858 832
## gene4 778 751 789 725 767 25 21 23 21 21
## gene5 552 521 509 520 551 475 478 456 492 495
## gene6 378 386 368 326 356 769 752 734 757 749
PCA
PCA on dataset for comparison purposes
PCA Matrix
## [1] 89.7 2.5 2.4 2.1 1.2 0.8 0.7 0.4 0.3 0.0
## Sample X Y
## wt1 wt1 -9.110030 3.11841444
## wt2 wt2 -9.227306 -1.84826109
## wt3 wt3 -8.522961 0.60003868
## wt4 wt4 -8.941175 0.07983663
## wt5 wt5 -9.100339 -1.89664048
## ko1 ko1 9.094450 0.18292561
## ko2 ko2 9.137243 -1.02702485
## ko3 ko3 9.189941 1.50153310
## ko4 ko4 8.767896 0.65976131
## ko5 ko5 8.712280 -1.37058335
Plot
Create a PCA plot using ggplot()
the wild-type samples are on the left side of the graph and the knock-out samples are on the right side. The x-axis, for PC1, accounts for 91% of the variation in the data. The y-axis, PC2, only accounts for about 3% of the variation in the data. This means most of the difference is between the WT and KO samples
#{-}
MDS/PCoA Plot
Distance Matrix
Create a distance matrix using dist() function
We transposed the matrix so that the samples are rows and we centered and scaled the measurements for each gene. We informed the dist function that we want to scale using the Euclidean distance metric.
Multi-dimensional Scaling
Perform MDS on the distance matrix using cmdscale()
<- cmdscale(distance.matrix, eig=TRUE, x.ret=TRUE) mds.stuff
We inform the function that we want the eigen values returned. These are used to calculate how much variation in the distance matrix each axis in the final MDS plot accounts for.
Calculate variation amount
Calculate the amount of variation each axis in the MDS plot accounts for using the eigen values
## [1] 89.7 2.5 2.4 2.1 1.2 0.8 0.7 0.4 0.3 0.0
Format and Plot Data
Just like the PCA graph, the wild-type samples are on the left side of the graph and the knock-out samples are on the right side. The x-axis accounts for 89.7% of the variation in the data and the y-axis only accounts for 3.1% of the variation in the data.
The PCA graph and the MDS graph are exactly the same. This is because we used the Euclidean metric to calculate the distance matrix. #{-}
Average of the Absolute Value of the Log Fold Change
Calculate distance using a different metric
This is what edgeR does when you call plotMDS() function
## wt1 wt2 wt3 wt4 wt5 ko1 ko2
## wt1 0.0000000 0.0000000 0.0000000 0.0000000 0.000000 0.00000000 0.00000000
## wt2 0.1100538 0.0000000 0.0000000 0.0000000 0.000000 0.00000000 0.00000000
## wt3 0.1224255 0.1077286 0.0000000 0.0000000 0.000000 0.00000000 0.00000000
## wt4 0.1181092 0.1265903 0.1156071 0.0000000 0.000000 0.00000000 0.00000000
## wt5 0.1175227 0.1280052 0.1187170 0.1203655 0.000000 0.00000000 0.00000000
## ko1 1.6142631 1.6156341 1.6051714 1.6009046 1.631418 0.00000000 0.00000000
## ko2 1.6350644 1.6287639 1.6248675 1.6172340 1.648746 0.09143202 0.00000000
## ko3 1.6169218 1.6209147 1.6113178 1.6072603 1.639320 0.08925412 0.09340202
## ko4 1.6257177 1.6288917 1.6221043 1.6172762 1.649079 0.09326805 0.09137187
## ko5 1.6267860 1.6208664 1.6149647 1.6085468 1.639741 0.08296916 0.09199447
## ko3 ko4 ko5
## wt1 0.00000000 0.00000000 0
## wt2 0.00000000 0.00000000 0
## wt3 0.00000000 0.00000000 0
## wt4 0.00000000 0.00000000 0
## wt5 0.00000000 0.00000000 0
## ko1 0.00000000 0.00000000 0
## ko2 0.00000000 0.00000000 0
## ko3 0.00000000 0.00000000 0
## ko4 0.08261177 0.00000000 0
## ko5 0.08526488 0.08372947 0
Perform MDS
## [1] 99.3 0.3 0.2 0.1 0.1 0.1 0.0 0.0 0.0 -0.1
Format and Plot Data
## Sample X Y
## wt1 wt1 0.8110533 0.0867784534
## wt2 wt2 0.8102102 -0.0253206718
## wt3 wt3 0.8030323 -0.0025730583
## wt4 wt4 0.7973662 -0.0261377731
## wt5 wt5 0.8288713 -0.0327738838
## ko1 ko1 -0.8006737 0.0003197392
## ko2 ko2 -0.8180405 -0.0465956111
## ko3 ko3 -0.8064127 0.0435146116
## ko4 ko4 -0.8158962 0.0481090188
## ko5 ko5 -0.8095105 -0.0453208249
While similar to the plot using Euclidean distance, the new graph is not the same. the x-axis accounts for more of the variation (99.4% vs. 89.7%).