LinkedIn question by Mohammad Ghoreishi: I am wondering how could I merge elements of data frame!
df <- read.table(sep = " ", stringsAsFactors = FALSE, text = "
ASG001 0 1 0 2 0 1 1 2 2 1 0 1
ASG002 0 2 0 2 1 0 1 1 2 1 0 1")
do.call(rbind, lapply(1:nrow(df), function(x) {
data.frame(id = df[x,1], val = paste(unlist(df[x,2:ncol(df)]), collapse = ""))
}))
## id val
## 1 ASG001 010201122101
## 2 ASG002 020210112101
library(tidyr)
unite(df, value, -V1, sep="")
## V1 value
## 1 ASG001 010201122101
## 2 ASG002 020210112101
tidyr::unite()
is easier to read than the do.call()
, lapply()
method. help(unite)
explain that unite is a “Convenience function to paste together multiple columns into one.” It is used a such unite(data, col, ..., sep = "_", remove = TRUE)
, with following arguments:
data A data frame.
col (Bare) name of column to add
... Specification of columns to unite. Use bare variable names. Select all variables between x and z with x:z, exclude y with -y. For more options, see the select documentation.
sep Separator to use between values.
remove If TRUE, remove input columns from output data frame.
data.frame(id = df$V1, value = Reduce(function(a,b)paste0(a,b),df[2:ncol(df)]))
## id value
## 1 ASG001 010201122101
## 2 ASG002 020210112101
library(microbenchmark)
microbenchmark(do.call(rbind, lapply(1:nrow(df), function(x) {
data.frame(id = df[x,1], val = paste(unlist(df[x,2:ncol(df)]), collapse = ""))
})), unit = "ms")
## Unit: milliseconds
## expr
## do.call(rbind, lapply(1:nrow(df), function(x) { data.frame(id = df[x, 1], val = paste(unlist(df[x, 2:ncol(df)]), collapse = "")) }))
## min lq mean median uq max neval
## 1.0549 1.117047 1.294416 1.168004 1.289251 2.927405 100
microbenchmark(unite(df, value, -V1,sep=""), unit = "ms")
## Unit: milliseconds
## expr min lq mean median
## unite(df, value, -V1, sep = "") 0.232907 0.248679 0.265304 0.258904
## uq max neval
## 0.275101 0.50941 100
microbenchmark(data.frame(id = df$V1,
value = Reduce(function(a,b)paste0(a,b),df[2:ncol(df)])),
unit = "ms")
## Unit: milliseconds
## expr
## data.frame(id = df$V1, value = Reduce(function(a, b) paste0(a, b), df[2:ncol(df)]))
## min lq mean median uq max neval
## 0.323672 0.347514 0.3676698 0.360611 0.3825265 0.655721 100
tidyr::unite()
is faster than the do.call()
, lapply()
, paste()
technique and than the Reduce()
technique. This result might be different on large datasets.