0. Introduction

This document demonstrates a few ways to generate excel files files using R code. The main purpose is to show how it is possible to manipulate a large data set to automatically generate XLSX files based on selected criteria and how to recombine XLSX files in a folder into a single workbook.

The code provided subdivides the data set by the education field and creates an an XLSX file for each value. The code can easily be modified to subdivide the data set by another field such as occupation.

The code also demonstrates how to: - Combine multiple XLSX files into individual worksheets in a single workbook - Combine multiple XLSX files onto a single worksheet in a workbook - Split a data source into separate worksheets in a workbook - Recombine excel tabs into a single worksheet

1. Importing and Inspecting a Data Set

The first code chunk imports data from the US Census Data data set, obtained from

https://archive.ics.uci.edu/ml/machine-learning-databases/adult/

https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data

The code also generates summary stats and charts on the data set based on the education field of the data set.

shell('cls')
library(tidyverse)
library(plotly)
library(readxl)
library(openxlsx)
library(formattable)
library(knitr)


#Set filepath. Create before running script.
path <- "C:/education/"

options(pillar.sigfig = 5)

# Adult Census Income:
# https://archive.ics.uci.edu/ml/machine-learning-databases/adult/
# https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data


# 1. Read data from internet source

# Download US census data
census_uci <- read.table('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', 
                         sep = ',', fill = F)
# Create a dataframe of column names
colnames(census_uci) <- c('age', 'workclass', 'fnlwgt', 'education', 
                          'education_num', 'marital_status', 'occupation', 'relationship', 'race', 'sex', 
                          'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income')


# 1.1 Inspect the Data
paste("Data set dimensions:")
## [1] "Data set dimensions:"
paste(ncol(census_uci), "fields") #no of fields
## [1] "15 fields"
paste(nrow(census_uci),"records") #no of records
## [1] "32561 records"
knitr::kable(head(census_uci, n=10), caption = "Sample of the Census Data")
Sample of the Census Data
age workclass fnlwgt education education_num marital_status occupation relationship race sex capital_gain capital_loss hours_per_week native_country income
39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K
38 Private 215646 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K
53 Private 234721 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K
28 Private 338409 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K
37 Private 284582 Masters 14 Married-civ-spouse Exec-managerial Wife White Female 0 0 40 United-States <=50K
49 Private 160187 9th 5 Married-spouse-absent Other-service Not-in-family Black Female 0 0 16 Jamaica <=50K
52 Self-emp-not-inc 209642 HS-grad 9 Married-civ-spouse Exec-managerial Husband White Male 0 0 45 United-States >50K
31 Private 45781 Masters 14 Never-married Prof-specialty Not-in-family White Female 14084 0 50 United-States >50K
42 Private 159449 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 5178 0 40 United-States >50K
# Summarise the data
paste("Summary data for the fields of the data set:")
## [1] "Summary data for the fields of the data set:"
summary(census_uci)
##       age                    workclass         fnlwgt       
##  Min.   :17.00    Private         :22696   Min.   :  12285  
##  1st Qu.:28.00    Self-emp-not-inc: 2541   1st Qu.: 117827  
##  Median :37.00    Local-gov       : 2093   Median : 178356  
##  Mean   :38.58    ?               : 1836   Mean   : 189778  
##  3rd Qu.:48.00    State-gov       : 1298   3rd Qu.: 237051  
##  Max.   :90.00    Self-emp-inc    : 1116   Max.   :1484705  
##                  (Other)          :  981                    
##          education     education_num                  marital_status 
##   HS-grad     :10501   Min.   : 1.00    Divorced             : 4443  
##   Some-college: 7291   1st Qu.: 9.00    Married-AF-spouse    :   23  
##   Bachelors   : 5355   Median :10.00    Married-civ-spouse   :14976  
##   Masters     : 1723   Mean   :10.08    Married-spouse-absent:  418  
##   Assoc-voc   : 1382   3rd Qu.:12.00    Never-married        :10683  
##   11th        : 1175   Max.   :16.00    Separated            : 1025  
##  (Other)      : 5134                    Widowed              :  993  
##             occupation            relationship  
##   Prof-specialty :4140    Husband       :13193  
##   Craft-repair   :4099    Not-in-family : 8305  
##   Exec-managerial:4066    Other-relative:  981  
##   Adm-clerical   :3770    Own-child     : 5068  
##   Sales          :3650    Unmarried     : 3446  
##   Other-service  :3295    Wife          : 1568  
##  (Other)         :9541                          
##                   race            sex         capital_gain  
##   Amer-Indian-Eskimo:  311    Female:10771   Min.   :    0  
##   Asian-Pac-Islander: 1039    Male  :21790   1st Qu.:    0  
##   Black             : 3124                   Median :    0  
##   Other             :  271                   Mean   : 1078  
##   White             :27816                   3rd Qu.:    0  
##                                              Max.   :99999  
##                                                             
##   capital_loss    hours_per_week         native_country     income     
##  Min.   :   0.0   Min.   : 1.00    United-States:29170    <=50K:24720  
##  1st Qu.:   0.0   1st Qu.:40.00    Mexico       :  643    >50K : 7841  
##  Median :   0.0   Median :40.00    ?            :  583                 
##  Mean   :  87.3   Mean   :40.44    Philippines  :  198                 
##  3rd Qu.:   0.0   3rd Qu.:45.00    Germany      :  137                 
##  Max.   :4356.0   Max.   :99.00    Canada       :  121                 
##                                   (Other)       : 1709
# Record count by education level
count_ed<-census_uci %>% 
  group_by(education) %>% 
             summarise(n())

# Barplot of record count by Education
p1<-census_uci %>% 
  ggplot(aes(x=education))+geom_bar(color="grey",fill="blue")+ggtitle("Count by Education Level")+ theme(axis.text.x = element_text(angle = 90))
ggplotly(p1)
# Summary of education count
# paste("Summary of education count")
knitr::kable(count_ed, caption = "Summary of education count"  )
Summary of education count
education n()
10th 933
11th 1175
12th 433
1st-4th 168
5th-6th 333
7th-8th 646
9th 514
Assoc-acdm 1067
Assoc-voc 1382
Bachelors 5355
Doctorate 413
HS-grad 10501
Masters 1723
Preschool 51
Prof-school 576
Some-college 7291

2. Export Data Into Separate XLSX Files

The following code chunk subdivides the initial data set by the education field and creates a single tab XLSX file for each value.

Be sure to set the correct folder path in the previous code chunk, which must be run in advance of this one. The XLSX files will load to C:/education/, unless adjusted.

# 2. Split data and export to separate Excel files

sp1 <- split(census_uci,census_uci$education)

# Get unique education values
names <- census_uci$education %>% unique()

# Loop through the the different education levels
for (i in names){
  # write.xlsx(sp1[i],paste(i,".xlsx"))
  write.xlsx(sp1[i],file.path(path,paste(i,".xlsx")))
  }

Upon running this chunk, 16 files should appear in the target folder each named for a value of the education field

3. Combine All Files in a folder to a Single Worksheet

This code chunk loops through all the files in a folder and exports the data into single worksheet in a new Excel file. If executed successfully, the record set in the destination file will be the same as the initial data set.

In order to work properly, all files in the source folder must have the same data structure. (Be sure to remove unnecessary files from the source folder). This code chunk should work on any folder with a set of similarly structured files. It is not limited to the files created previously. Simply adjust the path variable in the first code chunk.

# 3. Combine Excel files in a folder into a single workbook tab

# Store file path and destination file
merge_file_name <- paste(path,"single_tab_merged_file.xlsx")

# Generate a list of all the files in path folder
filenames_list <- list.files(path= path, full.names=TRUE)

# Read the files in the folder
paste("A List of the merged files")
## [1] "A List of the merged files"
All <- lapply(filenames_list,function(filename){
  print(paste("Merging",filename,sep = " "))
  read.xlsx(filename)
})
## [1] "Merging C:/education/ 10th .xlsx"
## [1] "Merging C:/education/ 11th .xlsx"
## [1] "Merging C:/education/ 12th .xlsx"
## [1] "Merging C:/education/ 1st-4th .xlsx"
## [1] "Merging C:/education/ 5th-6th .xlsx"
## [1] "Merging C:/education/ 7th-8th .xlsx"
## [1] "Merging C:/education/ 9th .xlsx"
## [1] "Merging C:/education/ Assoc-acdm .xlsx"
## [1] "Merging C:/education/ Assoc-voc .xlsx"
## [1] "Merging C:/education/ Bachelors .xlsx"
## [1] "Merging C:/education/ Doctorate .xlsx"
## [1] "Merging C:/education/ education_tabs.xlsx"
## [1] "Merging C:/education/ HS-grad .xlsx"
## [1] "Merging C:/education/ Masters .xlsx"
## [1] "Merging C:/education/ Preschool .xlsx"
## [1] "Merging C:/education/ Prof-school .xlsx"
## [1] "Merging C:/education/ single_tab_merged_file.xlsx"
## [1] "Merging C:/education/ Some-college .xlsx"
## [1] "Merging C:/education/Occupations.xlsx"
## [1] "Merging C:/education/Single_sheet_recombined_tabs.xlsx"
# Create a dataframe combining data from all files
df <- do.call(rbind.data.frame, All)
write.xlsx(df,merge_file_name)

If executed successfully, single_tab_merged_file.XLSX should appear in your source folder. You may open it to verify the number of fields (15) and records (32,561).

4. Combine All Files in a folder to Separate Worksheet Tabs

The following code chunk works the same as the previous one except that it creates separate destination worksheet tabs from each file. You may use this code chunk as an alternative to the previous one.

# 4. Combine Excel files in a folder into separate tabs

# Only for files with a similar data structure. 
# Remove unwanted files from folder

# Store file path and destination file
tabbed_file_name <- paste(path,"education_tabs.xlsx")

# Store file names in a dataframe
filenames_list <- list.files(path= path, full.names=TRUE)

# Read the files in folder

paste("A List of the transferred files")
## [1] "A List of the transferred files"
All <- lapply(filenames_list,function(filename){
  print(paste("Merging",filename,sep = " "))
  read.xlsx(filename)
})
## [1] "Merging C:/education/ 10th .xlsx"
## [1] "Merging C:/education/ 11th .xlsx"
## [1] "Merging C:/education/ 12th .xlsx"
## [1] "Merging C:/education/ 1st-4th .xlsx"
## [1] "Merging C:/education/ 5th-6th .xlsx"
## [1] "Merging C:/education/ 7th-8th .xlsx"
## [1] "Merging C:/education/ 9th .xlsx"
## [1] "Merging C:/education/ Assoc-acdm .xlsx"
## [1] "Merging C:/education/ Assoc-voc .xlsx"
## [1] "Merging C:/education/ Bachelors .xlsx"
## [1] "Merging C:/education/ Doctorate .xlsx"
## [1] "Merging C:/education/ education_tabs.xlsx"
## [1] "Merging C:/education/ HS-grad .xlsx"
## [1] "Merging C:/education/ Masters .xlsx"
## [1] "Merging C:/education/ Preschool .xlsx"
## [1] "Merging C:/education/ Prof-school .xlsx"
## [1] "Merging C:/education/ single_tab_merged_file.xlsx"
## [1] "Merging C:/education/ Some-college .xlsx"
## [1] "Merging C:/education/Occupations.xlsx"
## [1] "Merging C:/education/Single_sheet_recombined_tabs.xlsx"
# Row bind the data and split across Education field
df <- do.call(rbind.data.frame, All)
df<-split(df,df$education)

# Create destination file
write.xlsx(df,tabbed_file_name)

The file education_tabs.XLSX should appear in the source folder having 17 separate tabs including a tab with the entire record set created by the previous code.

5. Split Initail Data Source into Separate Worksheet Tabs

This code demonstrates how to take data directly from a data set and subdivide it into separate tabs by occupation. This can be executed immediately after executing the first code chunk.

# 5. Split original data source into separate Excel tabs

# Separate data frame by marital status
sp2 <- split(census_uci,census_uci$occupation)

#Export to Excel tabs. Missing values generate warnings in Excel
tab_file <- paste("Occupations.xlsx")

# Write data to destination file
xl <- write.xlsx(sp2,file.path(path,tab_file))

# List tab names and show worksheet order
xl
## A Workbook object.
##  
## Worksheets:
##  Sheet 1: " ?"
##  
## 
##  Sheet 2: " Adm-clerical"
##  
## 
##  Sheet 3: " Armed-Forces"
##  
## 
##  Sheet 4: " Craft-repair"
##  
## 
##  Sheet 5: " Exec-managerial"
##  
## 
##  Sheet 6: " Farming-fishing"
##  
## 
##  Sheet 7: " Handlers-cleaners"
##  
## 
##  Sheet 8: " Machine-op-inspct"
##  
## 
##  Sheet 9: " Other-service"
##  
## 
##  Sheet 10: " Priv-house-serv"
##  
## 
##  Sheet 11: " Prof-specialty"
##  
## 
##  Sheet 12: " Protective-serv"
##  
## 
##  Sheet 13: " Sales"
##  
## 
##  Sheet 14: " Tech-support"
##  
## 
##  Sheet 15: " Transport-moving"
##  
## 
##  
##  Worksheet write order: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
# worksheetOrder(xl)

The file Occupations.XLSX should appear with tabs for each of the 15 unique values of the occupation field.

6. Recombine Multiple Worksheets into a Single Worksheet

This last code chunk simply recombines the occupation tabs created in the previous code chunk into a single worksheet tab. Once again the data should have 15 fields with 32,561 records.

# 6. Recombine Excel tabs into a single sheet

# Create name of destination file
sheet_file <- paste("Single_sheet_recombined_tabs.xlsx")


#Concatenate file path and the source file
xl2<- paste(path,tab_file,sep= '')

#write to data frame with with bound rows
df<-xl2 %>% 
  excel_sheets() %>% 
   set_names() %>% 
  map(read_excel,path=xl2) %>% 
  bind_rows()

#Create destination File
write.xlsx(df,file.path(path,sheet_file)) 

Was this helpful? I welcome any feedback.

N Ray