RMarkdown for Reproducible Reports

SWWR 2023

Catalina Cañizares

Jan 13, 2023

RMarkdown Quarto for Reproducible Reports

Overview

We will cover the following:

  • Why do we need reproducible reporting
  • The Magic of RMarkdown Quarto
  • What is Quarto
  • Code Chunks and Output
  • Annotating your Work
  • Knit a report
  • Inline code

Scan this QR code

Why should we care?

  • “Reproducibility is a minimum requirement for a finding to be believable” (Goodman et al, 2016 )

  • Yet more than 70% of researchers have reported failure to reproduce another experiment.

Why is there a crisis? what can we do?

Making our research more reproducible and shareable

Source: 1,500 scientists lift the lid on reproducibility

The replication crisis hits close to home with evidence that psychology, and in particular, social psychology, has seen some difficulties in attempts to reproduce classic studies (Yaffe, 2019)

The Magic: RMarkdown

The Magic: RMarkdown

RMarkdown is an open-source scientific and technical publishing system built on Pandoc

  • We magically produce reproducible reports.

Future you and anyone else can pick up your .rmd file and:

  • Reproduce the same results
  • Have your code, results and text in the same document
  • Results, figures and tables automatically generate, so updates are not a nightmare
  • The file format is future-proof

At the end…

]

Source: Making your research reproducible

Quarto is the new generation of RMarkdown

So what is Quarto?

  • Quarto® is an open-source scientific and technical publishing system built on Pandoc

  • Quarto is a command line interface (CLI) that renders plain text formats (.qmd, .rmd, .md) OR mixed formats (.ipynb/Jupyter notebook) into:

    • Static PDF/Word/HTML reports
    • Books
    • Websites
    • Presentations (this presentation) and more

  • Easier to organize/structure document and document layout
  • Features largely cross-format
  • Better ability to integrate multiple languages in a PROJECT
  • Evaluate native language (R in knitr, Python/Julia in Jupyter)
  • HTML slides with revealjs are pandoc-compatible, so RStudio Visual Editor works with them

Source Get Started with Quarto

No more copy-paste

No more manually rebuilding analyses from disparate components

No more dread when the data is updated and you need to run an analysis.

We are all set to create a .qmd file

  1. Go to: Posit Cloud

Create an account or Log in

  1. Open a new project
  2. Install rUM
install.packages("rUM")
  1. Call the library
library(rUM)
  1. Create a Research Project
make_project("./", type = "Quarto (analysis.qmd)")

Inspecting the .qmd file

YAML

The Meta-Data Header: YAML

  • The first few lines of the document are the document header.
  • These lines tell R crucial information about how to build your report.
  • The entire header, and all of the document options in it, are bounded by the three horizontal dashes - - - above and below.

The Meta-Data Header: YAML

Currently our header is quite basic. It includes:

  • The title of the document; title: “Hello Quarto”
  • Who wrote it: author: “Catalina Canizares”
  • Today’s date: “2022-05-11”
  • The output type: html_document
  • Bibliography
  • Type of citation and format

Different YAML examples

Code Chunks

Anatomy of a Chunk

Chunk Options

Source: Code Chunks

More about Chunks

Make chunks like a pro

More about Chunks

Name chunks like a pro with kebab-case

Code chunk example

]

Markup Text

Markup Text

Now that we have a report with a header and some code, we need to explain what the code is doing and why.

  • This is where the plain text comes in.

  • Outside of a code chunk, type anything you want. You can even include pictures and tables.

Markup Text

Cross-Reference

Tables

Figures

Text Formatting

]

Source Get Started with Quarto

Text Headings

Source Get Started with Quarto

Exercise

Exercise

  • Let’s go to the project we created

  • Modify the YAML with your information, and name your file

  • Check the YAML so we can see the code and the results.

Exercise

  • Create a chunk and type:
install.packages("palmerpenguins")
install.packages("table1")

library(palmerpenguins)
library(table1)

Exercise

Are there differences between the penguin species regarding their flipper lengths?

  • Create a chunk and load the palmerpenguins data
data("penguins")
  • Create another chunk, name it, and explore the data for the flipper length for the different kind of penguins with this code:
table1(~ flipper_length_mm | species, data = penguins)
  • Describe what you see in the table.

Let’s visualize our findings

  • Create another chunk, let’s check
 ggplot(
  data = penguins,
  aes(x = species, y = flipper_length_mm)) +
  geom_boxplot(aes(color = species), 
               width = 0.3,
               show.legend = FALSE) +
  geom_jitter(aes(color = species),
              alpha = 0.5, 
              show.legend = FALSE, 
              position = position_jitter(
                width = 0.2, seed = 0)) +
  scale_color_manual(values = c("darkorange", "purple", "cyan4")) +
  labs(
    x = "Species",
    y = "Flipper length (mm)"
  )

We are ready to Render our report

Save your report and…

Look at your Console while it Renders

A little more more of that reproducible magic

Inside your text you can include executable code with the syntax:

For example:

There are 344 rows in the penguins data set.

The truth:*

More resources

Quarto Website

R Markdown: The Definitive Guide

Questions?

This presentation was based on:

  1. Dr. Gabriel Odom’s Lesson 3: RMarkdown for Reproducible Reports
  2. Dr. Raymond Balise class presentation on Markdown and R Markdown on January 25, 2022 at University of Miami.
  3. All the sources posted throughout the presentation