We start from the expression matrix. Let’s make a sample one.
set.seed(24032015)
expr <- matrix(runif(40, 0, 1), nrow=5)
rownames(expr) = LETTERS[1:5]
colnames(expr) = sprintf(letters[1:8], fmt="sample_%s")
expr
## sample_a sample_b sample_c sample_d sample_e sample_f
## A 0.1408473 0.11606910 0.5286093 0.20578027 0.01788021 0.94675890
## B 0.3322770 0.93556049 0.5876777 0.15587345 0.10799615 0.12305035
## C 0.2396301 0.06082499 0.9858472 0.53838789 0.24345075 0.01611634
## D 0.2061076 0.76382240 0.4278838 0.05132344 0.88254347 0.65528691
## E 0.6434158 0.50660391 0.9835111 0.76918065 0.77227784 0.18871680
## sample_g sample_h
## A 0.3665014 0.11105929
## B 0.7957475 0.80019163
## C 0.4246721 0.09832426
## D 0.4833377 0.05946408
## E 0.8265693 0.03261582
Now let’s make gene D promiscuous, i. e.
expr['D',] <- apply(expr, 2, function(cl) mean(cl[-which(names(cl) == 'D')]))
expr
## sample_a sample_b sample_c sample_d sample_e sample_f sample_g
## A 0.1408473 0.11606910 0.5286093 0.2057803 0.01788021 0.94675890 0.3665014
## B 0.3322770 0.93556049 0.5876777 0.1558734 0.10799615 0.12305035 0.7957475
## C 0.2396301 0.06082499 0.9858472 0.5383879 0.24345075 0.01611634 0.4246721
## D 0.3390425 0.40476462 0.7714113 0.4173056 0.28540124 0.31866060 0.6033726
## E 0.6434158 0.50660391 0.9835111 0.7691806 0.77227784 0.18871680 0.8265693
## sample_h
## A 0.11105929
## B 0.80019163
## C 0.09832426
## D 0.26054775
## E 0.03261582
Try to implement the «best friends» search in R.
# Compute correlation matrix
expr.t <- t(expr)
corMx <- cor(expr.t, method="pearson")
corMx
## A B C D E
## A 1.00000000 -0.232975706 0.095416573 0.2929939 -0.1557842
## B -0.23297571 1.000000000 -0.008157956 0.3251695 -0.1145120
## C 0.09541657 -0.008157956 1.000000000 0.8556582 0.7767369
## D 0.29299390 0.325169457 0.855658166 1.0000000 0.6974129
## E -0.15578420 -0.114512011 0.776736856 0.6974129 1.0000000
# Compute backwards rank
corMx.rank <- t(apply(-corMx, 1, rank)) # compute rank for every row
corMx.rank
## A B C D E
## A 1 5 3 2 4
## B 5 1 3 2 4
## C 4 5 1 2 3
## D 5 4 2 1 3
## E 5 4 2 3 1
corMx.bwrank <- t(apply(corMx.rank, 2, function(set) rownames(corMx)[ order(set) ]))
rownames(corMx.bwrank) <- rownames(corMx)
corMx.bwrank
## [,1] [,2] [,3] [,4] [,5]
## A "A" "C" "B" "D" "E"
## B "B" "D" "E" "A" "C"
## C "C" "D" "E" "A" "B"
## D "D" "A" "B" "C" "E"
## E "E" "C" "D" "A" "B"
Here we see, gene D is promiscuous indeed, though the result is not biased towards gene D due to using backwards ranking. So backwards ranking enables us to work with promiscuous genes efficiently.
Now let’s map gene names to the obtained friends lists to integers (for efficient storing and comparing to results obtained using other programming languages).
gene.map <- 1:nrow(corMx)
names(gene.map) <- rownames(corMx)
gene.map
## A B C D E
## 1 2 3 4 5
corMx.bwrank.idxed <- t(apply(corMx.bwrank, 1, function(row) gene.map[row]))
corMx.bwrank.idxed
## [,1] [,2] [,3] [,4] [,5]
## A 1 3 2 4 5
## B 2 4 5 1 3
## C 3 4 5 1 2
## D 4 1 2 3 5
## E 5 3 4 1 2
There are also other implementation of the RBR idea: using Python, using Haskell and using NoSQL database (RethinkDB with Ruby drivers).