Automating excel outputs

R with Excel

Introduction

In this tutorial i will review some features available to send output from R (dataframes) to Excel. This might be crucial because we feel more comfortable filtering or sorting data in excel, and also because we need to provide the output in excel to end users or as an input to other processes. There is not so much material available, compared to reading excel files. The overview is not exhaustive but restricted to some practical examples which we have handled in Regacc.

Creating a temporary file

For day to day use we can create an un-named temporary file. The best is to create a function that we can add to our rstudio snippets and call it when needed instead of typing it every time we need to do it.

show_in_excel <- function(.data){
  tmp <- paste0(tempfile(), ".xlsx")
  writexl::write_xlsx(.data,tmp)
  browseURL(tmp)
}

show_in_excel(df)

Exporting as an excel table

If we want to avoid having to manually filter the first row in Excel after opening the file for sorting/filtering we can export directly the dataframe as an Excel table.

library(openxlsx)
library(tidyverse)
df<-readr::read_csv("loan_data_cleaned.csv") |> 
  mutate(default=ifelse(loan_status==1,"defaulted","non-defaulted"))

hs <- createStyle(
  textDecoration = "BOLD", fontColour = "#BBBBBB", fontSize = 12,
  fontName = "Arial Narrow", fgFill = "blue"
)

write.xlsx(df, "loandata.xlsx", overwrite = TRUE, asTable = TRUE, headerStyle = hs)

output

Further customisation options

If we want to customise more the output, add a name to the worksheet, table style, font, number format, colours, etc. there are endless possibilities (https://cran.r-project.org/web/packages/openxlsx/vignettes/Formatting.html).

wb <- createWorkbook() 
    modifyBaseFont(wb, fontSize = 12, fontName = "Calibri Light")
    options(openxlsx.numFmt = "#,##0.00")
    addWorksheet(wb, "loan_ES")
    writeDataTable(wb, "loan_ES", df, tableStyle = "TableStyleMedium13")
    
saveWorkbook(wb,"loandata.xlsx", overwrite = TRUE)

Modifying an existing file

library(tidyverse)
df<-readr::read_csv("loan_data_cleaned.csv") |> 
  mutate(default=ifelse(loan_status==1,"defaulted","non-defaulted"))

temp<- df %>% 
  group_by(default,grade,home_ownership) %>% 
  summarise(avg = round(mean(annual_inc, na.rm = TRUE)))

ggk<- ggplot()+
  geom_col(data=temp,aes(avg, reorder(default,avg), fill= home_ownership))+
  geom_point(data=df, aes(annual_inc,default))+
  facet_wrap(~grade, scales = "free_x")+
  scale_x_continuous(scales::pretty_breaks(n=4), labels = scales::label_number())+
  ggthemes::theme_fivethirtyeight()+
  ggthemes::scale_fill_tableau()+
  theme(legend.position = "none")

iv_rates <- df |>
  select(home_ownership, loan_status) |>
  mutate(home_ownership=as.factor(home_ownership)) |> 
  group_by(home_ownership) |>
  summarize(avg_div_rate = mean(loan_status, na.rm=TRUE)) |>
  ungroup() |>
  mutate(
    regions = home_ownership |>
      fct_reorder(avg_div_rate)
)

ggh<-iv_rates |>
ggplot(aes(x=regions, y=avg_div_rate, fill=regions)) + geom_col(color="black") + theme_minimal() + 
  ggthemes::scale_fill_tableau() + 
theme(legend.position="bottom") + geom_label(aes(label=scales::percent(avg_div_rate)), color="white") + 
labs(
    title = "loan status by home ownership",
    y = "loan status", 
    x = "Regions"
) + scale_y_continuous(labels = scales::percent)


wb <- loadWorkbook("loandata.xlsx")
addWorksheet(wb, "Summary")
writeDataTable(wb, sheet="Summary", temp, startCol = 3, startRow = 3,tableStyle = "TableStyleLight13")
 print(ggk)


 print(ggh)

 addWorksheet(wb, "Chart2")

wb %>% insertPlot(sheet="Chart2",startCol =1, startRow=1,width=6,height =5, dpi=600)
saveWorkbook(wb, "loandata_new.xlsx", overwrite = TRUE)

output

Split the dataframe and write each group to a different worksheet

Sometimes we may want to split a big dataframe into several worksheets using the groups of a specific column. This is helpful to organise the output and avoid the number of rows limit of 1.000.000.

df_unit <- split(df,df$default)
write.xlsx(df_unit, file = "loan_default.xlsx", overwrite = TRUE)