Batch Processing RMarkdown Reports

Alyssa Goldberg
November 10, 2021

Challenge - Annual Attorney Marketing Reports

~650 Attorney Reports incorporating:

  • Email engagement with items authored by each attorney
  • Network analysis with co-authors
  • Google Analytics for each bio page, including source and next page stats
  • Engagement with webinars presented by the attorney
  • Contacts associated with the attorney in the CRM for review
  • Business development activity for each attorney (meetings, calls, etc.)

But for This Demo We’ll use Palmer Penguins

library(“palmerpenguins”)

penguin stock photo

Two Files - .R and .RMD

R file

  • Read in libraries
  • Read in and pre-process data
  • Write a custom function that passes data, objects, variables, and parameters to the - - RMarkdown file
  • Create global environment objects that can be used in the custom render function
  • Run the function over a list with a for() loop, apply function or purrr::map()

.RMD file

  • Accept variables and objects from the .R loop or custom function
  • Parse the YAML header data from the render::rmarkdown()
  • Attach a reference document for custom formatting
  • Run SQL or API queries
  • Knit the report

Set up the .R file with Data

library(palmerpenguins)
library(tidyverse)
library(knitr)
library(rmarkdown)

#read in and pre-process data
pdata <- palmerpenguins::penguins_raw %>%
  janitor::clean_names() %>%
  mutate(label = str_replace(
    string = species,
    pattern = " \\(",
    replacement = "\n("
  ))

#create a list for mapping
studies <- unique(pdata$study_name)

As this is a relatively small dataset I load the entire thing into the environment then subset for each report; however, I’ll typically run SQL or API queries from within the .RMD file.

Set up .R File with Function and Run It

generate_reports <- function(study) {
  #I like to superassign some objects for troubleshooting with <<-, but it isn't necessary
  study <<- study
  p_df <<- pdata %>% filter(study_name == study)
  min_date <- min(p_df$date_egg, na.rm = TRUE)
  max_date <- max(p_df$date_egg, na.rm = TRUE)
  title <- paste("Study: ", study)
  subtitle <- paste("From: ", min_date, "To: ", max_date)

  rmarkdown::render(
    #The RMarkdown file that will produce the report
    input = "penguin report.Rmd",

    #How should the report be produced. Can be a single value or a vector
    output_format = c("word_document", "html_document"),

    #Directory to write reports to if not the working directory  
    output_dir = "./reports",

    #Values to pass to the frontmatter.
    params = list(
      Date = Sys.Date(),
      Title = title,
      Subtitle = subtitle,
      Study = study
    ),
    output_file = paste0("Report_", study) #Give each file a unique name or you will accidentally overwrite the previous file in the map
  )
}

map(studies, ~ generate_reports(.x))

The .RMD File YAML Header

author: "Alyssa Goldberg"
output:
  word_document:
    toc: yes
    reference_docx: pin-styles.docx
params:
  Date: Date
  Study: Study
  Title: Title
  Subtitle: Subtitle
date: "`r params$Date`"
title: "`r params$Title`"
subtitle: "`r params$Subtitle`"

The params in the .RMD file YAML header should match those included in the render::rmarkdown() function in the .R file.

If you are using params for other YAML header items, make sure you list the params above the items calling the params.

There are additional arguments you can pass to the Word document, including whether to include a TOC and a reference document for styles.

The Reference Doc

The reference doc, for Word output, is a .docx file - do not use a Word template! Create the default pandoc first, then edit the styles - this ensures that you are controlling the correct styles. RMarkdown won’t recognize custom-named styles. screenshot of word document .docx file in draft mode with style pane revealed

Reference Doc Resources

Editing Word styles is beyond the scope of this presentation, but there are several really good resources online including:

The .RMD Body

#this value has been passed from the .R file within the generate_reports function that produces the reports.
df <- p_df

if (length(unique(df$island) > 1)) {
  df %>%
    ggplot(., aes(x = date_egg)) +
    geom_point(stat = "count", aes(color = label)) +
    labs(x = "Date Egg", y = "Count of Eggs Per Day") +
    facet_wrap(. ~ island) +
    theme_bw() +
    theme(legend.position = "bottom")
} else{
  df %>%
    ggplot(., aes(x = date_egg)) +
    geom_point(stat = "count", aes(color = label)) +
    labs(x = "Date Egg", y = "Count of Eggs Per Day") +
    theme(legend.position = "bottom")
}

Any variables or objects set in the function that calls the .RMD are available to the .RMD script.

Run the Function in the .R File

purrr::map(studies, ~generate_reports(.x))

screenshot of multiple reports