- The importance of distances
- Euclidian distance
- some matrix algebra notation
- Distance Exercises: p. 322-323
Oct 14, 2016
Source: http://master.bioconductor.org/help/course-materials/2002/Summer02Course/Distance/distance.pdf
A metric satisfies the following five properties:
##biocLite("genomicsclass/tissuesGeneExpression") library(tissuesGeneExpression) data(tissuesGeneExpression) dim(e) ##gene expression data
## [1] 22215 189
table(tissue) ##tissue[i] corresponds to e[,i]
## tissue ## cerebellum colon endometrium hippocampus kidney liver ## 38 34 15 31 39 26 ## placenta ## 6
Interested in identifying similar samples and similar genes
Euclidean distance as for two dimensions. E.g., the distance between two samples \(i\) and \(j\) is:
\[ \mbox{dist}(i,j) = \sqrt{ \sum_{g=1}^{22215} (Y_{g,i}-Y_{g,j })^2 } \]
and the distance between two features \(h\) and \(g\) is:
\[ \mbox{dist}(h,g) = \sqrt{ \sum_{i=1}^{189} (Y_{h,i}-Y_{g,i})^2 } \]
The distance between samples \(i\) and \(j\) can be written as:
\[ \mbox{dist}(i,j) = \sqrt{ (\mathbf{Y}_i - \mathbf{Y}_j)^\top(\mathbf{Y}_i - \mathbf{Y}_j) }\]
with \(\mathbf{Y}_i\) and \(\mathbf{Y}_j\) columns \(i\) and \(j\).
t(matrix(1:3, ncol=1))
## [,1] [,2] [,3] ## [1,] 1 2 3
matrix(1:3, ncol=1)
## [,1] ## [1,] 1 ## [2,] 2 ## [3,] 3
t(matrix(1:3, ncol=1)) %*% matrix(1:3, ncol=1)
## [,1] ## [1,] 14
Note: R is very efficient at matrix algebra
kidney1 <- e[, 1] kidney2 <- e[, 2] colon1 <- e[, 87] sqrt(sum((kidney1 - kidney2)^2))
## [1] 85.8546
sqrt(sum((kidney1 - colon1)^2))
## [1] 122.8919
dim(e)
## [1] 22215 189
(d <- dist(t(e[, c(1, 2, 87)])))
## GSM11805.CEL.gz GSM11814.CEL.gz ## GSM11814.CEL.gz 85.8546 ## GSM92240.CEL.gz 122.8919 115.4773
class(d)
## [1] "dist"
Excerpt from ?dist:
dist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
dist
class output from dist()
is used for many clustering algorithms and heatmap functionsCaution: dist(e)
creates a 22215 x 22215 matrix that will probably crash your R session.
\[x_{gi} \leftarrow \frac{(x_{gi} - \bar{x}_g)}{s_g}\]