This will be a fast one. I made this tutorial because some people
comes to me with excel files with sheets divided by
samples/subjects/etc. The problem is that now they have 30 sheets with a
lot of lines and columns and they want to merge it.
Although doing it manually is possible, we all know how hands-on excel
is error-prone (and in this case, tedious and time consuming). That
being said, here is a simple tutorial in how to merge excel sheets in a
single data frame.
Set your working directory (where your excel file is) and load the needed packages
library(dplyr)
library(xlsx)
library(readxl)
library(rio)Example Data
The code below produces an excel file with 5 sheets, 2000 lines and 5
columns, each.
Obs: I am pretty sure that there is an easier way to do it, but I could
not figure out how to make for loops and the paste()
function work without the xlsx package complaining. If you
have the solution, please, reach out to me.
#Make a dataframe.
df <- as.data.frame(matrix(runif(n=50000, min=1, max=20), nrow=10000))
#Make a group column and bind it to the previous df.
df2<-data.frame(group=LETTERS[1:5])
df2<-bind_rows(replicate(2000,df2,simplify=FALSE))
#Bind both
full_df<-cbind(df2,df)
full_df$group<-as.factor(full_df$group)
#Create the example excel file with 5 sheets, 2000 lines and 5 columns in each sheet.
write.xlsx(subset(full_df, subset=group=="A"),
file="example.xlsx",sheetName = "sheet1",append = TRUE,row.names = FALSE)
write.xlsx(subset(full_df, subset=group=="B"),
file="example.xlsx",sheetName = "sheet2",append = TRUE,row.names = FALSE)
write.xlsx(subset(full_df, subset=group=="C"),
file="example.xlsx",sheetName = "sheet3",append = TRUE,row.names = FALSE)
write.xlsx(subset(full_df, subset=group=="D"),
file="example.xlsx",sheetName = "sheet4",append = TRUE,row.names = FALSE)
write.xlsx(subset(full_df, subset=group=="E"),
file="example.xlsx",sheetName = "sheet5",append = TRUE,row.names = FALSE)Merge
Use the import_list() command from the rio
package to merge the sheets and its done!
data_list <- import_list("example.xlsx", setclass = "tbl", rbind = TRUE)
#If you want, you can create and additional column to keep trace of the sheet of origin.
#Make a sheet list
excel_file<-"example.xlsx"
excel_sheets(excel_file)## [1] "sheet1" "sheet2" "sheet3" "sheet4" "sheet5"
sheets_list<-as.vector(excel_sheets(excel_file))
#Count lines and convert each file line to the name of the original sheet
tags_freqs<-as.data.frame(table(data_list$`_file`))
tags_freqs<-as.vector(tags_freqs$Freq)
SHEET_ID<-rep(sheets_list,tags_freqs)
SHEET_ID<-as.data.frame(SHEET_ID)
data_list<-cbind(data_list,SHEET_ID)
#Save it
write.table(data_list,"merged_example.txt", sep = "\t",row.names = FALSE)