Required Packages:

Note that library pacman is optional, I use pacman package simply to load all my installed libraries at once.

knitr::opts_chunk$set(echo = TRUE)

library(pacman)
pacman::p_load(readr,
               openxlsx,
               data.table)

Case 1: Reading multiple excel of files to merge them into a whole dataframe:

We all might have come across the situation where we were required to import a series of excel files from a folder and create master data. Instances like weekly sales data or any other scenarios where we are having a series of data with similar columns. Even then we have many conventional ways of achieving the task, however, I would like to illustrate a simple three-step process by creating a list of available files then reading them recursively to join each of them vertically.

Also, the method enables us to assign a unique identification column for better traceability. For example, we are reading four excel files with similar column (let that be in any order, R will make reorder them first before merging) with observations, idcol() will assign “1” to the set of observation coming for the first file, similarly for other files.

# Assign the folder location into a variable
file.path <- "~/file path"

# Create the list of files present in that folder location
file.list <- list.files(path = file.path, pattern = '*.xlsx')

# Create all the files from the above list
df.list <- lapply(file.list,read.xlsx)

# Merging them vertically. idcol() will create the identifier
worldhappiness.df <- rbindlist(df.list, idcol = "id",fill = TRUE)

We then can replace the id with our chosen identification method. Here I have used the famous world happiness dataset by United Nation. I have 5 years of data starting from “2015-2019”. Hence, I replace the identity column with the actual year such that I know observation are belonging from.

worldhappiness.df <- worldhappiness.df %>% mutate(id = if_else(id == "1",2015,
                                                               if_else(id == "2",2016,
                                                                       if_else(id=="3",2017,
                                                                               if_else(id=="4",2018,2019)))))

Case 2: Reading multiple sheets from an excel file to merge them into a whole dataframe:

Similarly to Case 1, in this case, we have an excel file with multiple datasheets, if I would like to read them at once without exclusively importing them one-by-one. Given the steps below is an effective way without losing any observation or data.

# Assign the folder location and file name into a variable
file.path <- "~/file path/dataframe.xlsx"

# Create a list of sheet names present in the excel file
sheet.list <- openxlsx::getSheetNames(dataframe.xlsx)

# Now, read the data sheets from the created list
file.list <- base::lapply(sheet.list, openxlsx::read.xlsx, xlsxFile = datarame.xlsx)

# This is optional, but I have assigned the file name in the list with the actual sheet name
name(sheet.list) <- sheet.list

# Merge the all the data sheets into one whole dataframe with id as an indicator of which sheets the data belongs to
list.df <- rbindlist(file.list, idcol = "id", fill = TRUE)

Thank you for going through my code.