Test for SPP RAF Manual Chapter 4 chunk

Chapter 4: Automated report in multiple languages from R to Excel

Excel is a very powerful tool. Many of the datasets we receive, and sometimes the outputs we need, must be in this format. However, automating processes within Excel can be challenging. For example, imagine you need to create a hundred graphs without titles, and suddenly your supervisor requires you to add a generic title to all of them. Completing this task would take a significant amount of time (approximately 30 seconds per graph if you already know the titles; 50 minutes non-stop). Moreover, what if your supervisor, after reviewing the updated graphs, decides to remove the titles again? Just by performing this task, which adds no value, you lose at least 100 minutes of your time.This chapter and the following chapter focus on coordinating Excel and R to ensure that such processes can be automated. Like the previous chapters, these changes can be made by modifying just a few lines of code.

The exercise is based on the last graph XX plotted in Chapter 3. The new package, openxlsx, simplifies the creation and editing of Excel worksheets. The function createWorkbook() is used to request R to create an Excel file. In this case, the file will appear as if it was created by “ESCWA,” but you could use any name. In the code that follows, you will see the same plot (Graph XX). However, this plot will be saved as an image in Excel format, along with a table containing the microdata used to produce it.

The updated R script introduces several key changes to automate the creation of an Excel report containing both a graph and a table. The openxlsx library is now used to create and manipulate Excel files, with functions like createWorkbook(), addWorksheet(), insertImage(), and writeDataTable() enabling the design and population of the spreadsheet. A scatterplot comparing household members by age and gender is generated using ggplot2, saved as a PNG file with ggsave(), and inserted into the Excel sheet using insertImage(). Additionally, a summary table is created using spread() and added to the same sheet via writeDataTable(). Finally, the Excel file is saved using saveWorkbook(), with the option to overwrite existing files. This process results in two output files: the PNG graph and the Excel report, both ready for further use.

library(readxl)
library(here)
library(dplyr)
library(tidyr)
library(lubridate)  
library(ggplot2)
library(scales)
library(openxlsx) # New library for Excel file manipulation

# Create a new Excel workbook
wb = openxlsx::createWorkbook(creator = 'ESCWA')  # The operator "::" calls a specific function from the `openxlsx` package.

# Add a new worksheet to the workbook
newSheet = addWorksheet(wb, sheetName = "Age vs. Household members") 

# Specify the location of the file using `here`
here = here() 
IndividualData <- read_excel((here("Input/Bases/DATA_BASES.xlsx")), sheet = "Individuals")
HH_Data        <- read_excel((here("Input/Bases/DATA_BASES.xlsx")), sheet = "Households")

# Create the column 'Age' from 'Date of birth'
IndividualData = IndividualData %>%
  mutate(Age = trunc((as.Date(`Date of birth`) %--% as.Date(Sys.Date()) / years(1) )))

# Merge the two datasets
Merged_Data = merge(IndividualData, HH_Data, by="HHID")

Merged_Data_New = Merged_Data %>% 
  group_by(Age, Gender) %>% 
  summarise(`Household members`=mean(`Household members`), na.rm = TRUE)

newPlot = Merged_Data_New %>% 
  ggplot(aes(x=Age, y=`Household members`, colour = Gender))+
  geom_point()+
  labs(
    title = "Household members vs. Age",
    subtitle = "By gender",
    x = "Age",
    y = "Household members",
    color = " ",
    caption = "Source: ESCWA (2024)"
  )+
  scale_color_brewer(palette="Paired")+
  coord_flip()+
  theme_classic()+
  scale_y_continuous(expand=expansion(mult=c(0,0.05)))

# Specify the file name and path for the graph
fileName = "Output/Part1 Examples/Scatterplot age vs household members.png"

# Adjust the dataset used to produce the graph into a table using the `spread()` function
cleanTable = Merged_Data_New %>% spread(Gender, `Household members`)

# Save the graph as a PNG file
ggsave(fileName, plot= newPlot, width= 6, height = 4, scale = 1)

# Insert the graph into the Excel sheet
insertImage(wb, file = fileName, sheet= newSheet, startRow = 1, startCol = 1, width = 6, height = 4)

# Write the table into the Excel sheet
writeDataTable(wb, sheet=newSheet, x = cleanTable, startRow = 1, startCol = 10)

# Save the Excel workbook
saveWorkbook(wb,
             file = "Output/Part1 Examples/Excel report.xlsx",
             overwrite = TRUE)

Graph Number 1 xxx Graph1

The file corresponding to the Excel report once accessed looked like Graphxx

Graph xxx. Screenshot of the Excel report

To highlight:

Notice that the starting cells of the graph and table matches those that are assigned by the code.
Notice the name of the spreadsheet matches the code.
If this data needs to go into a report, now it is easy to copy and paste it from the Excel file.

Dictionary to incorporate more than one language in the report

The previous section outlines the key elements required to print R results in Excel. This section builds on that knowledge to create an automated functional report. However, for this exercise, a new Excel file (referred to as a “dictionary”) is necessary. We define the Dictionary as an Excel file containing information about the titles and legends of different graphs, sources, and, most importantly, the translation to a second language— in this case, Arabic. For this example, you can find it in the input folder, under the “dictionary” folder, and it is named Dictionary.xlsx.

This Excel file contains three sheets, each with a table. The first sheet is called “Label” and includes all the elements of a graph (title, subtitle, X and Y axis legends, and captions), in both Arabic and English, as well as the color palette applied.

Image 3 xxxx. Spreadsheet Labels in Dictionary.xlsx

The second sheet, “Weights” includes all the weights used for the current dataset

Image xxxx. Weights Labels in Dictionary.xlsx

The third one, “Category” has the labels of categorical values. For example, Gender variable is divided into 2 values (ID) 1 and 2 variable, the sheet category helps us to understand that 1 corresponds to Male and 2 to Female as in the i xx

Image xxx

At this stage it looks confusing to understand why this work is being done, but while we advance in the code, its usefulness becomes evident.

Note that if you find a row of “XXXXXXXXX” in the last line of the Excel file, this is a practical indicator to help you avoid errors generated by the program when tables contain columns with many missing values. The Arabic script does not appear and instead shows strange codes. These codes are the way R displays characters, but once processed, the output will display the correct content.

Next, we create a variable to help identify the language of the graphs and tables: 1 for English and 0 for Arabic. Once this is done, all categorical variables need to be adjusted via factors to ensure that the labels are in the correct language. In the example below, we do this with the “Gender” variable.

In this case, as.matrix(genderCategories)[,2] tells R that the levels are those listed in the second column of the Excel table associated with categories. Then, as.matrix(genderCategories)[,4-language]tells R that if language=1, then 4-language=3. Thus, it will take column 3, which contains the new English labels.

Hence, the first task is to upload these files into R. Reading the Dictionary and Setting Up the Language

First, we load the dictionary Excel file which contains the translations and graph labels. We also set the language to English (language = 1) for this example. If you need the Arabic version, you can change it to language = 0.

# Read the new variable Dictionary
Dictionary = "Input/Dictionary/Dictionary.xlsx"
dictionaryLabels <- read_excel(here(Dictionary), sheet = "Labels")

GenderCategories = read_excel(here(Dictionary), sheet = "Category", range = "A1:D3")
# The range specifies the sheet and the range of cells to read.
# Language is set to 1 for English and 0 for Arabic
language = 1

#Specify the path using here() to point to the location of the files and read two datasets (IndividualData and HH_Data).
here = here()  # to specify the location of the file
IndividualData <- read_excel(here("Input/Bases/DATA_BASES.xlsx"), sheet = "Individuals")
HH_Data <- read_excel(here("Input/Bases/DATA_BASES.xlsx"), sheet = "Households")

#Create the Age column based on the Date of birth field by calculating the difference from the current date.

IndividualData = IndividualData %>%
  mutate(Age = trunc((as.Date(`Date of birth`) %--% as.Date(Sys.Date()) / years(1))))

#Merge the individual data and household data into one dataset using the merge() function.

Merged_Data = merge(IndividualData, HH_Data, by = "HHID")
# Summarize the data by Age and Gender and adjust the Gender variable labels according to the selected language.
Merged_Data_New = Merged_Data %>%
  group_by(Age, Gender) %>%
  summarise(`Household members` = mean(`Household members`)) %>%
  mutate(Gender = factor(Gender,
                         levels = as.matrix(GenderCategories)[, 2],
                         labels = as.matrix(GenderCategories)[, 4 - language]))

In this code, we use as.matrix(GenderCategories)[, 2] to extract the levels (the actual gender values) from the second column of the GenderCategories table. We then adjust the labels based on the language variable: if language = 1 (English), we use column 3 (the English labels); if language = 0 (Arabic), we use column 4 (the Arabic labels).

Then we set the indicatorCode to identify the graph, and we use the subset() function to extract the labels from the dictionary for the chosen graph.

library(showtext) #This packaged allow us to  use a font that supports Arabic
font_add("Arial", "arial.ttf") # Arial and Times New Roman support Arabic
showtext_auto()

language=0
Merged_Data_New = Merged_Data %>%
  group_by(Age, Gender) %>%
  summarise(`Household members` = mean(`Household members`)) %>%
  mutate(Gender = factor(Gender,
                         levels = as.matrix(GenderCategories)[, 2],
                         labels = as.matrix(GenderCategories)[, 4 - language]))
indicatorCode = '0_1'
labCodes = dictionaryLabels %>% subset(Code == indicatorCode)
# Use ggplot2 to create a scatter plot of Age vs. Household members colored by Gender. 
# The labels and colors are adjusted dynamically based on the selected language.

newPlot = Merged_Data_New %>%
  ggplot(aes(x = Age, y = `Household members`, colour = Gender)) +
  geom_point() +
  labs(
    title = paste0(as.character(labCodes[, 5 * (1 - language) + 2])),
    subtitle = as.character(labCodes[, 5 * (1 - language) + 3]),
    x = as.character(labCodes[, 5 * (1 - language) + 4]),
    y = as.character(labCodes[, 5 * (1 - language) + 5]),
    caption = as.character(labCodes[, 5 * (1 - language) + 6]),
    color = ""
  ) +
  scale_color_brewer(palette = as.character(labCodes[, 12])) +
  theme_classic() +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.05)),
    position = if(language == 0) {'right'} else {'left'}
  ) +
  theme(
    legend.position = 'bottom',
    text = element_text(family = 'Arial'),  # Use a font that supports Arabic
    axis.text.x = element_text(size = 10, family = 'Arial'),
    axis.text.y = element_text(size = 10, family = 'Arial'),
    axis.title.x = element_text(hjust = 0.5, size = 10, family = 'Arial'),
    axis.title.y = element_text(hjust = 0.5, size = 10, family = 'Arial'),
    legend.text = element_text(size = 10, family = 'Arial'),
    strip.text = element_text(size = 10, family = 'Arial'),
    plot.title = element_text(face = 'bold', size = 10, family = 'Arial', hjust = if (language == 0) {1} else {0}),
    plot.subtitle = element_text(size = 10, family = 'Arial', hjust = if (language == 0) {1} else {0}),
    legend.key.size = unit(0.5, 'cm')
  )

# x-axis position is reversed for Arabic text by checking the language variable.
if(language == 0){
  newPlot = newPlot + scale_x_reverse(expand = expansion(mult = c(0.05, 0)))
} else {
  newPlot = newPlot + scale_x_continuous(expand = expansion(mult = c(0, 0.05)))
}
print(newPlot)

Having established the concepts in the previous section, avid readers will be eager to add the code needed to generate the Excel files. While this step is not covered here, as it is a repetition of the previous section, the two results are displayed so that the trainee can gain a clear understanding of the possibilities of this technique.

wb = openxlsx::createWorkbook(creator = 'ESCWA')
newSheet = addWorksheet(wb, sheetName = "Age vs. Household members")

# Export to excel:\
fileName="Output/Part1AR Examples/Scatterplot age vs household members.png"
cleanTable=Merged_Data_New %>% spread(Gender, `Household members`)
ggsave(fileName, plot= newPlot, width= 6, height = 4, scale = 1)
insertImage(wb, file = fileName, sheet= newSheet, startRow = 1, startCol = 1, width = 6, height = 4)
writeDataTable(wb, sheet=newSheet, x = cleanTable, startRow = 1, startCol = 10)

saveWorkbook(wb, file = "Output/Part1AR Examples/Excel report.xlsx", overwrite = TRUE)

Graph xxx Screenshotxx

In conclusion, by creating an organized dictionary, R can automate the generation of reports in multiple languages. As promised in the introduction to this chapter, if changes are needed across a large number of graphs, the user only needs to update the dictionary. After running the code, these changes will be automatically incorporated into the report. This approach saves time and reduces the risk of errors, as once the dictionary is corrected, there is no need to check each graph and table individually to identify where changes are required.

Test for SPP RAF Manual Chapter 4 chunk

Camila

2025-02-19

Chapter 4: Automated report in multiple languages from R to Excel

Dictionary to incorporate more than one language in the report