Designing template

We start from the expression matrix. Let’s make a sample one.

set.seed(24032015)
expr <- matrix(runif(40, 0, 1), nrow=5)
rownames(expr) = LETTERS[1:5]
colnames(expr) = sprintf(letters[1:8], fmt="sample_%s")
expr
##    sample_a   sample_b  sample_c   sample_d   sample_e   sample_f
## A 0.1408473 0.11606910 0.5286093 0.20578027 0.01788021 0.94675890
## B 0.3322770 0.93556049 0.5876777 0.15587345 0.10799615 0.12305035
## C 0.2396301 0.06082499 0.9858472 0.53838789 0.24345075 0.01611634
## D 0.2061076 0.76382240 0.4278838 0.05132344 0.88254347 0.65528691
## E 0.6434158 0.50660391 0.9835111 0.76918065 0.77227784 0.18871680
##    sample_g   sample_h
## A 0.3665014 0.11105929
## B 0.7957475 0.80019163
## C 0.4246721 0.09832426
## D 0.4833377 0.05946408
## E 0.8265693 0.03261582

Now let’s make gene D promiscuous, i. e. every other gene is its «friend». We can try to sumulate it by assigning its «expression» value to the mean of the one of other genes for every sample.

expr['D',] <- apply(expr, 2, function(cl) mean(cl[-which(names(cl) == 'D')]))
expr
##    sample_a   sample_b  sample_c  sample_d   sample_e   sample_f  sample_g
## A 0.1408473 0.11606910 0.5286093 0.2057803 0.01788021 0.94675890 0.3665014
## B 0.3322770 0.93556049 0.5876777 0.1558734 0.10799615 0.12305035 0.7957475
## C 0.2396301 0.06082499 0.9858472 0.5383879 0.24345075 0.01611634 0.4246721
## D 0.3390425 0.40476462 0.7714113 0.4173056 0.28540124 0.31866060 0.6033726
## E 0.6434158 0.50660391 0.9835111 0.7691806 0.77227784 0.18871680 0.8265693
##     sample_h
## A 0.11105929
## B 0.80019163
## C 0.09832426
## D 0.26054775
## E 0.03261582

Implementing in R

Try to implement the «best friends» search in R.

# Compute correlation matrix
expr.t <- t(expr)
corMx  <- cor(expr.t, method="pearson")
corMx
##             A            B            C         D          E
## A  1.00000000 -0.232975706  0.095416573 0.2929939 -0.1557842
## B -0.23297571  1.000000000 -0.008157956 0.3251695 -0.1145120
## C  0.09541657 -0.008157956  1.000000000 0.8556582  0.7767369
## D  0.29299390  0.325169457  0.855658166 1.0000000  0.6974129
## E -0.15578420 -0.114512011  0.776736856 0.6974129  1.0000000
# Compute backwards rank
corMx.rank <- t(apply(-corMx, 1, rank))  # compute rank for every row
corMx.rank
##   A B C D E
## A 1 5 3 2 4
## B 5 1 3 2 4
## C 4 5 1 2 3
## D 5 4 2 1 3
## E 5 4 2 3 1
corMx.bwrank <- t(apply(corMx.rank, 2, function(set) rownames(corMx)[ order(set) ]))
rownames(corMx.bwrank) <- rownames(corMx)
corMx.bwrank
##   [,1] [,2] [,3] [,4] [,5]
## A "A"  "C"  "B"  "D"  "E" 
## B "B"  "D"  "E"  "A"  "C" 
## C "C"  "D"  "E"  "A"  "B" 
## D "D"  "A"  "B"  "C"  "E" 
## E "E"  "C"  "D"  "A"  "B"

Here we see, gene D is promiscuous indeed, though the result is not biased towards gene D due to using backwards ranking. So backwards ranking enables us to work with promiscuous genes efficiently.

Now let’s map gene names to the obtained friends lists to integers (for efficient storing and comparing to results obtained using other programming languages).

gene.map <- 1:nrow(corMx)
names(gene.map) <- rownames(corMx)
gene.map
## A B C D E 
## 1 2 3 4 5
corMx.bwrank.idxed <- t(apply(corMx.bwrank, 1, function(row) gene.map[row]))
corMx.bwrank.idxed
##   [,1] [,2] [,3] [,4] [,5]
## A    1    3    2    4    5
## B    2    4    5    1    3
## C    3    4    5    1    2
## D    4    1    2    3    5
## E    5    3    4    1    2

Implementations

Other implementation

There are also other implementation of the RBR idea: using Python, using Haskell and using NoSQL database (RethinkDB with Ruby drivers).