RMarkdown and Bookdown for Academic Writing in R

Thea Knowles, R-Ladies #LdnOnt

January 14, 2020

Academic writing with R Markdown and Bookdown

Agenda

  1. The tools we’ll be using
  2. Getting set up
  3. Writing a summary report
  4. Writing a manuscript
  5. Writing a thesis

1. Tools

Why R Markdown?

  • Integrate R code directly into writing using basic Markdown syntax
  • Reference management integration
  • Reproducibility
  • Accessible learning curve

https://rmarkdown.rstudio.com/

Why bookdown?

Knitting with knitr

When you run R Markdown, you are knitting together plain Markdown text and R code.

knitr is the engine.

Materials for today

We will generate the following files:

  • Helper R script (helper.R)
  • Summary report (starwars_summary.Rmd)
  • Manuscript (starwars_manuscript.Rmd)
  • Thesis (starwars_thesis.Rmd)

We will also use the following files

  • custom_reference.docx
  • apa.csl
  • starwars_refs.bib

2. Getting set up

Setting up: Software

  • R
  • RStudio
    • Recent versions of RStudio also include Pandoc, which is required to compile documents
  • Latex for Mac or Windows (if you want to compile to PDF).
    • Alternatively, install TinyTex, the Latex distribution created and recommended by Yihui Xie, creator of RMarkdown and bookdown[^139][^1032].

Setting up: R packages

  • bookdown R package.
  • Installing bookdown will automatically import knitr and rmarkdown pacakges as well. Install it by entering the following code into your R console:
  • Not necessary, but awfully helpful for data analysis in R in general, is the tidyverse R package
install.packages(c("bookdown", "tidyverse"))
# includes rmarkdown, knitr

Setting up: Git & GitHub

  • We won’t go over this in today’s presentation, but git is a very handy version control system that helps you back up your projects, and has easy integration with RStudio.
  • Recommended reading: Happy Git with R

Setting up: Recommended workflow for a new project

R Markdown document components

YAML

YAML (rhymes with camel): The header that tells R Markdown how to generate your document. Indentation and spacing are very important.

  • Permits the following to happen when you knit:
    • .Rmd -> knitr -> .md -> Pandoc -> output
    • Output can be .docx, .html, .pdf, and many others
  • YAML: “YAML Ain’t Markup Language”

Basic:

title: "Untitled"
author: "Thea Knowles"
date: '2018-02-18'
output: word_document

YAML

Gettin’ fancy

title: "A very important title"
subtitle: "A less important subtitle"
author:
- Thea Knowles^1^, Thea Knowles' Alter Ego^2^
- ^1^Western University, ^2^University of Western Ontario
date: "14 January, 2020"
output:
  bookdown::word_document2:
    fig_caption: yes
    md_extensions: +footnotes
    reference_docx: custom_reference.docx
    toc: yes
date: "Last updated: ` r format(Sys.time(), '%d %B, %Y')`"

YAML

Even fancier:

  • output extensions
  • template
  • bibliography file
  • csl (references style guide)
  • css (supreme customization!)

Different options for:

Essential parts of any R Markdown document

Chunk options

  • Chunks are sections that will include R code. By setting defaults at the beginning of your document, you can specify what you want most of your chunks to do.
  • In each chunk, you can specify options in the form tag=value in the chunk header.
    • For example, in the following, the tag include is set to FALSE, indicating that we don’t want the contents of this chunk included in the output

This is the default chunk options set when you create a new .RMD (R Markdown) file.

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Chunk options

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

This chunk provides the following information for “knitting” the document:

  • setup: the name of the chunk You shouldn’t have two chunks with the same name, unless they are unnamed (in which case they just get numbered automatically during the knit process)
  • include = false: the chunk will not be included in the output after knitting.
  • knitr::opts_chunk$set(echo = TRUE): the default behavior for chunks is to “echo;” you will see the actual code printed in the final output. You can set this to false if you don’t want the actual code included by default.

Chunk options

  • You don’t have to change this unless you want to
  • You can assign different values on a chunk-by-chunk basis
  • More on this in a minute!

Essential parts of any R Markdown document

Text

Essential parts of any R Markdown document

R Code chunks

  • Insert chunks: CMD + k or click Code >> Insert chunk
  • name your chunk (myChunk)
  • specify options (echo=TRUE)

Some other chunk options :

  • echo = FALSE: don’t show the code itself in the final document
  • include = FALSE: don’t include this code or its output in the final document
  • eval = FALSE: don’t actually run this code at all
  • And more…

3. Writing a summary report

Recipe for R Markdown summary report

My suggestion

Ingredient In this example
data dplyr:: starwars
helper script helper.R
R Markdown summary document starwars_summary_report.Rmd

1. Data

Starwars

library(dplyr)
starwars %>% head()
## # A tibble: 6 x 13
##   name  height  mass hair_color skin_color eye_color birth_year gender
##   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
## 1 Luke…    172    77 blond      fair       blue            19   male  
## 2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>  
## 3 R2-D2     96    32 <NA>       white, bl… red             33   <NA>  
## 4 Dart…    202   136 none       white      yellow          41.9 male  
## 5 Leia…    150    49 brown      light      brown           19   female
## 6 Owen…    178   120 brown, gr… light      blue            52   male  
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

Recipe for R Markdown summary document

My suggestion

Ingredient In this example
data dplyr:: starwars
helper script helper.R
R Markdown summary document starwars_summary_report.Rmd

2. Helper script

helper.R

  • Use a helper .R script to do your data cleaning and basic analyses
  • Straight up R code. No markdown.
  • You can pull this .R document into future .Rmd documents using source()
  • The benefit of using a helper script is that you can keep your analyses consistant. Whether you’re writing an informal summary report to share with your supervisor or a full manuscript, you don’t have to copy and paste the analyses each time you do something new.

Note: Other options for code helpers

helper.RData: If your helper script takes a long time to run, you may instead find it useful to save its output to an .RData file:

save("helper.RData")

You can then load this .Rdata into your later reports instead of the raw .R script. You source an .R file:

source("helper.R")

You load an .Rdata file:

load("helper.RData")

Note: Other options for code helpers

Modular .R scripts: For bigger projects, I often break my R scripts down into smaller scripts, e.g.,

  • 1_tidy.R, 2_models.R, 3_plots.R

In my workflows, each of these sources the last. I.e., the first line of 2_models.R is

source("1_tidy.R")

And in my R Markdown file, I source the highest level or a final .RData file.

Recipe for R Markdown summary document

My suggestion

Ingredient In this example
data dplyr:: starwars
helper script helper.R
R Markdown summary document starwars_summary_report.Rmd

3. R Markdown document

  • starwars_summary_report.Rmd

Recall: R Markdown documents…

  • End in extension .Rmd
  • Incorporate both R code (in chunks) and plain text (in Markdown)
  • Require YAML at the beginning

Let’s get started!

Hold up! Let’s check in

Packages: Do you have all the packages installed?

install.packages(c("rmarkdown", "bookdown", "tidyverse"))

Directory structure: The following files should all be in the same directory:

  • apa.csl
  • custom_reference.docx
  • helper.R
  • rmd-bkdn.Rproj
  • starwars_refs.bib

Open helper.R

Review as group

EXERCISE 1

  • Create a new R Markdown file and save it in a subdirectory called “reports”
  • Delete everything below the YAML
  • Insert:
  1. A chunk that sources helper.R
  2. Another chunk that calls the sw_summary_human data frame
  3. A line of text with inline R code, for example, calling the average mass of all species
  • Then Knit to HTML (Knit button OR CMD + K)

EXERCISE 1: ANSWERS

1. A chunk that sources helper.R:

```{r source-helper}
source("../helper.R")
```

2. Another chunk that calls the sw_summary_human data frame

```{r summary-df}
sw_summary_human
```

3. A line of text containing inline R code, e.g:

This average weight of all species is `\r mean(starwars$mass, na.rm=TRUE)`

EXERCISE 2

  • Break helper.R into 2 smaller scripts: 1_tidy.R and 2_models.R.
  • Save these new scripts in a subdirectory called “r_scripts”
  • Run both scripts, save an .RData file
  • In your .Rmd file, comment out your source() line and replace it with load(), calling your new .RData file in.
  • Knit again

Is everything the same?

4. Writing a manuscript

Recap of recipe for a R Markdown summary report:

Ingredient In this example
data starwars
helper script helper.R
R Markdown summary document summary_report.Rmd

Recipe for R Markdown .docx manuscript

Ingredient In this example
data starwars
helper(s) helper.R, starwars.RData, etc…
R Markdown document(s) manuscript.Rmd
style reference .docx file (specify styles in your Word doc) custom_reference.docx
references in .bib format starwars_refs.bib
style bibliography (csl) file (how your bibliography will be formatted) apa.csl

Recipe for R Markdown manuscript: Caveat!

  • This is one (my) way of doing it
  • There are many other ways!
  • See resources at end of this presentation for inspiration!

Recipe for R Markdown manuscript: Caveat!

  • While you can be really specific in your formatting, we won’t get into a ton of detail here.
  • Since many journals often do a lot of the formatting from their end, what authors actually need to submit may be fairly bare-bones (sometimes)
  • If you’re submitting a .docx file for publication, you can also make your final tweaks in the actual Word document itself, rather than in R Markdown
    • Save this until the end, because whatever you do in the Word document itself will be rewritten when you knit in R Markdown.

manuscript.Rmd

We will start with the main body file

  • Open manuscript.Rmd
  • Knit it!
  • Examine it 🔍

Manuscript: YAML

We can make use of many more options!

Notice I’ve used outputs from the bookdown package here. Bookdown gives us way more functionality (an important one: cross-referencing figures and tables).

YAML: custom_reference.docx

output:
  bookdown::word_document2:
    fig_caption: yes
    md_extensions: +footnotes
    reference_docx: ../custom_reference.docx
    toc: yes
  • Allows you to set the styles for your .docx output in Microsoft Word directly
  • Contents of the .docx file are IGNORED
  • Styles and properties (margins, headers text, etc) are used in .Rmd’s .docx output
  • For easiest use, include the custom_reference.docx in the same directory as the rest of your RProject

YAML: Bibliography

bibliography: ../starwars_refs.bib

References must be in BibLaTex (.bib) format

  • Most reference managers can easily convert to .bib

  • Use keys to cite a work in the markdown text, which will look like this in your markdown text:

    “It is well known that some things are facts, and others are not [@knowles2018].”

Bibliography: citr

  • A very useful RStudio Add-in that allows you to cite your sources from a drop-down menu: citr

Bibliography info: Caveat!

Thea’s preferences here, full disclosure:

  • I use one “master” references.bib file for all my projects
  • This file stays in one location, and I include the path to it in the YAML: bibliography: /Users/thea/references.bib
  • My “keys” all have the same format in order to make it easy to remember: firstauthorYEAR (knowles2018)
    • additional keyword if multiples (knowles2018dbs)

Bibliography info: Caveat!

Thea’s preferences here, full disclosure:

  • When I come across a new paper, I immediately add its info to my .bib file
  • I get the bibtex info using Google Scholar’s citation tools (specifically, the GoogleScholar Chrome plugin)
  • I use JabRef to manage my .bib file because I find it more user friendly, but you could also edit the .bib file in any text editor (just be cautious of formatting!)

YAML: CSL file: apa.csl

CSL: Citation Style Language

csl: ../apa.csl
  • Specifies how you want citations and bibliography formatted
  • Downloadable from many sources
  • .csl files I use often:
    • apa.csl
    • biomed-central.csl
      • when I need numeric in-text citations
    • chicago-annotated-bibliography.csl
      • when I’m writing annotated bibliographies

Writing the manuscript

  • Write just as you did for the summary document:
    • Markdown-styled text, and R chunks sprinkled throughout!
  • Separate writing tasks: “story writing” versus “results reporting”

Inserting figures and tables

Not data-related!

Inserting figures and tables

  • You can insert tables and figures in R chunks, as we did in the summary document
  • You can also insert them as non-R elements that wouldn’t be accessible from the data
    • e.g., Participant demographics

Inserting tables in Markdown

i.e., not R code

  • Markdown tables have a pretty straightforward syntax:

Inserting images in Markdown

i.e., not R code

Several ways to do this! Here’s one:

  • Have a subdirectory called images; store your non-data-related images here
```{r sw-logo, fig.cap = "Starwars logo", echo=FALSE}
knitr::include_graphics("../images/starwars.png")
```

Inserting tables and figures with R code

Inserting figures in R chunks

  • Just like in the summary document, but we can add more options to the chunk
```{r fig-sw, fig.cap = "Mass by Height for all species", 
echo = FALSE, fig.align='center'}
sw_plot
```
  • We will refer to the figure by its chunk label fig-sw
  • The caption will be specified by fig.cap

Inserting tables in R chunks

Use knitr::kable()

```{r tab-sw-human, results='asis', echo = FALSE}
knitr::kable((sw_summary_human), 
             booktabs=T, 
             caption="Average height and mass for humans vs. non-humans")
```
  • knitr::kable() will allow you to cross-reference easily later, but is not the only way.
  • Table options specified as an argument to kable(), not as a chunk option
  • Consider also using flextable() with the captioner package as well.

Embedding figures/tables in R chunks

Caution:

  • Not all chunk options are available for all outputs:

Chunk options fig.align, out.width, out.height, out.extra are not supported for docx output

  • Notice cross-referencing tables worked for the knitr::kable() option, but out markdown table above did not get automatically numbered. If you want to include both R and non-R tables, I’d suggest either 1) converting markdown tables to R code or 2) using the flextable package to generate captions and cross-references.

Cross-referencing figures using bookdown

In the YAML, rather than specify outputs like word_document or html_document, we will now use the bookdown versions of these: - e.g., bookdown::html_document2, bookdown::word_document2 - This will allow us to automatically refer to Figures/Tables without having to explicitly refer to them by number - This is helpful if you wind up moving things around in your manuscript.

Cross-referencing figures

Syntax for cross-referencing with bookdown::word_document2 output requires you:

  • Refer to the environment (e.g., fig vs. tab)
  • Refer to the chunk label containing the figure or table

See Figure @ref(fig:fig-sw) below. See Table @ref(tab:tab-sw-human).

https://bookdown.org/yihui/bookdown/figures.html#figures

More to learn

  • redoc: A new package for enabling 2-way Rmarkdown and Microsoft Word workflow
  • Grammar and spelling in R Markdown
  • Make your life easier while writing by using:
  • APA manuscript templates for R Markdown using Papaja

More to learn

  • It’s okay to Google. Lots.
  • It’s okay to be hacky at times to just do the thing
    • Efficiency and elegance come with time
      • Well, sometimes the elegance lags behind

Awesome resources of using R Markdown for academic writing

Next: Using bookdown to compile multiple R Markdown documents