Hey Kaleb, so below I will comment on every line.
x<- structure(c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0), .Dim = c(11L,
5L), .Dimnames = list(c("AACSL|729522", "AACS|65985", "AADACL2|344752",
"AADACL3|126767", "AADACL4|343066", "AADAC|13", "AADAT|51166",
"AAGAB|79719", "AAK1|22848", "AAK12|14", "AANAT|15"), c("S18",
"S20", "S45", "S95", "S100")))
y<- structure(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 0), .Dim = c(11L, 4L), .Dimnames = list(c("A1BG",
"A1CF", "A2ML1", "A4GALT", "AACS", "AAK1", "AARD", "AARS2", "AASDHPPT",
"AASS", "BAACS"), c("S18", "S10", "S45", "S95")))
z<- structure(c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), .Dim = c(11L, 5L), .Dimnames = list(c("AACSL|729522",
"AACS|65985", "AADACL2|344752", "AADACL3|126767", "AADACL4|343066",
"AADAC|13", "AADAT|51166", "AAGAB|79719", "AAK1|22848", "AAK12|14",
"AANAT|15"), c("S18", "S20", "S45", "S95", "S100")))library(data.table)
library(reshape2)
library(dplyr)x value and convert it to a dataframe (it’s a matrix as provided above). We only do that because the dplyr library works exclusively with dataframes and I used it heavily in this code. So, nothing changes, here, just the type.dplyr does not keep the rownames. If we want to keep them (and we do obviously), it is always helpful to transform them into a real column. In essence, mutate(smth=smth_else) is either conversion of an existing column where smth is a column name, or creating a new one (our case) where smth is added to the end of the table. smth_else here can be either a function, or simply a vector of length == nrow of the table.x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>% head## S18 S20 S45 S95 S100 rownames
## 1 0 1 1 0 0 AACSL|729522
## 2 0 0 1 1 0 AACS|65985
## 3 0 0 0 1 1 AADACL2|344752
## 4 0 0 0 0 1 AADACL3|126767
## 5 1 0 0 0 0 AADACL4|343066
## 6 1 1 0 0 0 AADAC|13
y rownames matches the one from x. So, we do the same thing, but instead of providing a vector of names to create a new column, we do that on the fly with an anonymous function that selects the frist part of the name:x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
mutate(nms=sapply(rownames(x),
function(i){
return(
strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
})) %>% head## S18 S20 S45 S95 S100 rownames nms
## 1 0 1 1 0 0 AACSL|729522 AACSL
## 2 0 0 1 1 0 AACS|65985 AACS
## 3 0 0 0 1 1 AADACL2|344752 AADACL2
## 4 0 0 0 0 1 AADACL3|126767 AADACL3
## 5 1 0 0 0 0 AADACL4|343066 AADACL4
## 6 1 1 0 0 0 AADAC|13 AADAC
There is an nms column now, that contains only the “match-able” part of the rownames
melt. melt is a function from the reshape2 package and is used to convert the table from the so called “wide format” to a “long format” (ie., a linker table in SQL is an example of such formatting). Note that melt uses id.vars, those essentially are the key columns. In our case those are the row names, that we are going to match by the end to this whole procedure. See what happens to the data below:x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
mutate(nms=sapply(rownames(x),
function(i){
return(
strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
})) %>%
melt(id.vars=c("nms","rownames")) %>% head(n = 25)## nms rownames variable value
## 1 AACSL AACSL|729522 S18 0
## 2 AACS AACS|65985 S18 0
## 3 AADACL2 AADACL2|344752 S18 0
## 4 AADACL3 AADACL3|126767 S18 0
## 5 AADACL4 AADACL4|343066 S18 1
## 6 AADAC AADAC|13 S18 1
## 7 AADAT AADAT|51166 S18 0
## 8 AAGAB AAGAB|79719 S18 0
## 9 AAK1 AAK1|22848 S18 0
## 10 AAK12 AAK12|14 S18 0
## 11 AANAT AANAT|15 S18 1
## 12 AACSL AACSL|729522 S20 1
## 13 AACS AACS|65985 S20 0
## 14 AADACL2 AADACL2|344752 S20 0
## 15 AADACL3 AADACL3|126767 S20 0
## 16 AADACL4 AADACL4|343066 S20 0
## 17 AADAC AADAC|13 S20 1
## 18 AADAT AADAT|51166 S20 1
## 19 AAGAB AAGAB|79719 S20 0
## 20 AAK1 AAK1|22848 S20 0
## 21 AAK12 AAK12|14 S20 0
## 22 AANAT AANAT|15 S20 0
## 23 AACSL AACSL|729522 S45 1
## 24 AACS AACS|65985 S45 1
## 25 AADACL2 AADACL2|344752 S45 0
The code above just lists “each rowname for a a given variable equals some value”. And we did this because it is now convenient to merge the two tables.
y table, but just on the fly inside the merge function. I will just slightly change that below so to make it more explicit. And of course, just like in SQL’s join (merge is join in fact, just the dplyr name for it in this case), you need the “join by what?”. And that is we want to join by the rownames AND the variables, right? so they would match between tables.y.l<- y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms", by=c("variable","nms"), all.x=TRUE)
head(y.l)## nms variable value
## 1 A1BG S18 0
## 2 A1CF S18 0
## 3 A2ML1 S18 0
## 4 A4GALT S18 0
## 5 AACS S18 1
## 6 AAK1 S18 0
x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
mutate(nms=sapply(rownames(x),
function(i){
return(
strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
})) %>%
melt(id.vars=c("nms","rownames")) %>%
merge(., y.l, by=c("variable","nms"), all.x=TRUE) %>% head## variable nms rownames value.x value.y
## 1 S18 AACS AACS|65985 0 1
## 2 S18 AACSL AACSL|729522 0 NA
## 3 S18 AADAC AADAC|13 1 NA
## 4 S18 AADACL2 AADACL2|344752 0 NA
## 5 S18 AADACL3 AADACL3|126767 0 NA
## 6 S18 AADACL4 AADACL4|343066 1 NA
Looks good, right? cuz now both the rownames and the variable names match between x and y regardless of their presence in either. Now we trow always the nms and value.x (we don’t need them any longer) with select. select is the dplyr say of choosing the columns to keep or to throw away with -;
dcast it back to a “wide format”. Here dcast does “de-melt” so to say.. just the opposite action to bring the tables back to their original format.x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
mutate(nms=sapply(rownames(x),
function(i){
return(
strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
})) %>%
melt(id.vars=c("nms","rownames")) %>%
merge(., y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms"), by=c("variable","nms"), all.x=TRUE) %>%
select(-nms, -value.x) %>% dcast(formula = rownames~variable, value.var="value.y") %>% head## rownames S18 S20 S45 S95 S100
## 1 AACS|65985 1 NA 0 0 NA
## 2 AACSL|729522 NA NA NA NA NA
## 3 AADAC|13 NA NA NA NA NA
## 4 AADACL2|344752 NA NA NA NA NA
## 5 AADACL3|126767 NA NA NA NA NA
## 6 AADACL4|343066 NA NA NA NA NA
x %>% as.data.frame %>% mutate(rownames=rownames(x)) %>%
mutate(nms=sapply(rownames(x),
function(i){
return(
strsplit(x=i,split="|",fixed=TRUE)[[1]][[1]])
})) %>%
melt(id.vars=c("nms","rownames")) %>%
merge(., y %>% as.data.frame %>% mutate(nms=rownames(y))%>% melt(id.vars="nms"), by=c("variable","nms"), all.x=TRUE) %>%
select(-nms, -value.x) %>% dcast(formula = rownames~variable, value.var="value.y") -> xy
#now put back the column names where they belong
rownames(xy)<-xy$rownames
#now the only thing left is to arrange the columns
xy[rownames(x),colnames(x)] %>% head## S18 S20 S45 S95 S100
## AACSL|729522 NA NA NA NA NA
## AACS|65985 1 NA 0 0 NA
## AADACL2|344752 NA NA NA NA NA
## AADACL3|126767 NA NA NA NA NA
## AADACL4|343066 NA NA NA NA NA
## AADAC|13 NA NA NA NA NA
It all should be there now, the only thing that we have discussed is that the matrices are not exactly the same in terms of rownames, that is where all the NAs coming from..
Good luck!