3 Code

Well, first we take the x value and convert it to a dataframe (it’s a matrix as provided above). We only do that because the dplyr library works exclusively with dataframes and I used it heavily in this code. So, nothing changes, here, just the type.
Next, we add the rownames as a column. We do this because for technical reasons dplyr does not keep the rownames. If we want to keep them (and we do obviously), it is always helpful to transform them into a real column. In essence, mutate(smth=smth_else) is either conversion of an existing column where smth is a column name, or creating a new one (our case) where smth is added to the end of the table. smth_else here can be either a function, or simply a vector of length == nrow of the table.

x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>% head

##   S18 S20 S45 S95 S100       rownames
## 1   0   1   1   0    0   AACSL|729522
## 2   0   0   1   1    0     AACS|65985
## 3   0   0   0   1    1 AADACL2|344752
## 4   0   0   0   0    1 AADACL3|126767
## 5   1   0   0   0    0 AADACL4|343066
## 6   1   1   0   0    0       AADAC|13

Cool, now as you provided the initial matrices with row names that differ slightly and only the first part of the y rownames matches the one from x. So, we do the same thing, but instead of providing a vector of names to create a new column, we do that on the fly with an anonymous function that selects the frist part of the name:

x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
        mutate(nms=sapply(rownames(x),
                          function(i){
                                  return(
                                          strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
                          })) %>% head

##   S18 S20 S45 S95 S100       rownames     nms
## 1   0   1   1   0    0   AACSL|729522   AACSL
## 2   0   0   1   1    0     AACS|65985    AACS
## 3   0   0   0   1    1 AADACL2|344752 AADACL2
## 4   0   0   0   0    1 AADACL3|126767 AADACL3
## 5   1   0   0   0    0 AADACL4|343066 AADACL4
## 6   1   1   0   0    0       AADAC|13   AADAC

There is an nms column now, that contains only the “match-able” part of the rownames

Now, it comes down to melt. melt is a function from the reshape2 package and is used to convert the table from the so called “wide format” to a “long format” (ie., a linker table in SQL is an example of such formatting). Note that melt uses id.vars, those essentially are the key columns. In our case those are the row names, that we are going to match by the end to this whole procedure. See what happens to the data below:

x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
        mutate(nms=sapply(rownames(x),
                          function(i){
                                  return(
                                          strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
                          })) %>%
        melt(id.vars=c("nms","rownames")) %>% head(n = 25)

##        nms       rownames variable value
## 1    AACSL   AACSL|729522      S18     0
## 2     AACS     AACS|65985      S18     0
## 3  AADACL2 AADACL2|344752      S18     0
## 4  AADACL3 AADACL3|126767      S18     0
## 5  AADACL4 AADACL4|343066      S18     1
## 6    AADAC       AADAC|13      S18     1
## 7    AADAT    AADAT|51166      S18     0
## 8    AAGAB    AAGAB|79719      S18     0
## 9     AAK1     AAK1|22848      S18     0
## 10   AAK12       AAK12|14      S18     0
## 11   AANAT       AANAT|15      S18     1
## 12   AACSL   AACSL|729522      S20     1
## 13    AACS     AACS|65985      S20     0
## 14 AADACL2 AADACL2|344752      S20     0
## 15 AADACL3 AADACL3|126767      S20     0
## 16 AADACL4 AADACL4|343066      S20     0
## 17   AADAC       AADAC|13      S20     1
## 18   AADAT    AADAT|51166      S20     1
## 19   AAGAB    AAGAB|79719      S20     0
## 20    AAK1     AAK1|22848      S20     0
## 21   AAK12       AAK12|14      S20     0
## 22   AANAT       AANAT|15      S20     0
## 23   AACSL   AACSL|729522      S45     1
## 24    AACS     AACS|65985      S45     1
## 25 AADACL2 AADACL2|344752      S45     0

The code above just lists “each rowname for a a given variable equals some value”. And we did this because it is now convenient to merge the two tables.

Now, here is see the confusing part in my code, I did the same thing as above to the y table, but just on the fly inside the merge function. I will just slightly change that below so to make it more explicit. And of course, just like in SQL’s join (merge is join in fact, just the dplyr name for it in this case), you need the “join by what?”. And that is we want to join by the rownames AND the variables, right? so they would match between tables.

y.l<- y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms", by=c("variable","nms"), all.x=TRUE)
head(y.l)

##      nms variable value
## 1   A1BG      S18     0
## 2   A1CF      S18     0
## 3  A2ML1      S18     0
## 4 A4GALT      S18     0
## 5   AACS      S18     1
## 6   AAK1      S18     0

x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
        mutate(nms=sapply(rownames(x),
                          function(i){
                                  return(
                                          strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
                          })) %>%
        melt(id.vars=c("nms","rownames")) %>%
        merge(., y.l, by=c("variable","nms"), all.x=TRUE) %>% head

##   variable     nms       rownames value.x value.y
## 1      S18    AACS     AACS|65985       0       1
## 2      S18   AACSL   AACSL|729522       0      NA
## 3      S18   AADAC       AADAC|13       1      NA
## 4      S18 AADACL2 AADACL2|344752       0      NA
## 5      S18 AADACL3 AADACL3|126767       0      NA
## 6      S18 AADACL4 AADACL4|343066       1      NA

Looks good, right? cuz now both the rownames and the variable names match between x and y regardless of their presence in either. Now we trow always the nms and value.x (we don’t need them any longer) with select. select is the dplyr say of choosing the columns to keep or to throw away with -;

Aaaand now we dcast it back to a “wide format”. Here dcast does “de-melt” so to say.. just the opposite action to bring the tables back to their original format.

x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
        mutate(nms=sapply(rownames(x),
                          function(i){
                                  return(
                                          strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
                          })) %>%
        melt(id.vars=c("nms","rownames")) %>%
        merge(., y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms"), by=c("variable","nms"), all.x=TRUE) %>%
        select(-nms, -value.x) %>% dcast(formula = rownames~variable, value.var="value.y") %>% head

##         rownames S18 S20 S45 S95 S100
## 1     AACS|65985   1  NA   0   0   NA
## 2   AACSL|729522  NA  NA  NA  NA   NA
## 3       AADAC|13  NA  NA  NA  NA   NA
## 4 AADACL2|344752  NA  NA  NA  NA   NA
## 5 AADACL3|126767  NA  NA  NA  NA   NA
## 6 AADACL4|343066  NA  NA  NA  NA   NA

The only thing left to do is arrange the rownames as they had been before we did all this.

x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
        mutate(nms=sapply(rownames(x),
                          function(i){
                                  return(
                                          strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
                          })) %>%
        melt(id.vars=c("nms","rownames")) %>%
        merge(., y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms"), by=c("variable","nms"), all.x=TRUE) %>%
        select(-nms, -value.x) %>% dcast(formula = rownames~variable, value.var="value.y") -> xy
#now put back the column names where they belong
rownames(xy)<-xy$rownames
#now the only thing left is to arrange the columns
xy[rownames(x),colnames(x)] %>% head

##                S18 S20 S45 S95 S100
## AACSL|729522    NA  NA  NA  NA   NA
## AACS|65985       1  NA   0   0   NA
## AADACL2|344752  NA  NA  NA  NA   NA
## AADACL3|126767  NA  NA  NA  NA   NA
## AADACL4|343066  NA  NA  NA  NA   NA
## AADAC|13        NA  NA  NA  NA   NA

It all should be there now, the only thing that we have discussed is that the matrices are not exactly the same in terms of rownames, that is where all the NAs coming from..

Good luck!

Matrix Merge

Alexander Tuzhikov

January 21, 2016

1 Initial data you provided

2 Librarires

3 Code