Reproducible data analysis using RMarkdown

Aims for today

Sorting out your RStudio working environment.
Getting you ready to write reproducible reports in RMarkdown.

How can you trust published results?

Results are communicated through reports such as peer reviewed scientific articles.
Reported results should be reproducible!
Data and code should be made available to others.
It is difficult to relate code to figures, tables, numbers etc. especially where numbers were copy-pasted from software.
Also manual processing of data and generation of reports is error prone.

Why do we need reproducible analyses?

Open Science Collaboration led by Brian Nosek (Open Science Collaboration 2015).
Aimed to replicate important findings in psychological research.
Main finding: 36% where replicated; only 23% for Social Psychology
Reasons: small samples, pressure to publish sensational results.
Make Psychology Science Again: pre-registration, replications, large samples (power analysis), abandoning significance testing, transparency.

A truely reproducible report

Data, code, report need be one unit.
Based on the concept of literate programming (Knuth 1984): text and code are linked in one single file to generate manual or computer program.
This principle can be used to creating reproducible reports, sometimes known as dynamic documents (Xie 2017).
RMarkdown is the best way of doing this using R.

Why RMarkdown?

Documentation of all analysis steps
Quickly updating and reproducing analysis
No copy-pasting between programmes
Producing texts, website, supplementary materials, APA-style manuscripts, slides
Easy cross-referencing
Automatic reference lists
Easy type-setting of equations

Key concepts and tools

What is RMarkdown?

Text file format, a script like an R script.
Open, edit, run in RStudio like scripts.
RMarkdown are a mixture of markdown code and normal R code (and your data).
R code in RMarkdown documents occurs in R chunks, i.e. blocks of code, or inline R code inside of markdown code
Markdown: minimal syntax that instructs how text should be formatted.
Rendering into .pdf, .html, .docx, .doc

Setup an R-project

What it is: A special folder for your work in R.
What it does: Keeps your scripts, data, and results together.
Why it helps:
- Sets the folder as your working directory automatically.
- No setwd() headaches.
- Easy to share or move your whole project.
Everything for one analysis stays neatly in one place.
Create R-Project for psyc30815-stats-3
Create sub-directories for “data”, “exercises”, “slides”
Download the blomkvist.csv data set form the NOW Learning Room
Add the data file to the folder “data”

Installation

RStudio comes with necessary R package rmarkdown
To create pdf outputs, we require the package tinytex

# Is tidyverse installed? If not run
install.packages('tidyverse')

install.packages('tinytex')
tinytex::install_tinytex()
# then restart RStudio

# Then, try
tinytex:::is_tinytex() # should return TRUE

From the RStudio menu …

Load data

Create a new R chunk: try CTRL+ALT+I or CMD+ALT+I (i.e. the letter “i”)
Create a chunk called packages and load libraries needed:

library(tidyverse)

Create a new chunk called loaddata and load Blomkvist et al. (2017) data:

blomkvist <- read_csv("data/blomkvist.csv") %>% 
  select(id, age, smoker, sex, rt = rt_hand_d) %>% 
  drop_na()

The setup chunk: global options

Note ```{r setup, echo=FALSE}
setup is a label of this chunk (optional; useful for cross-referencing of figures and tables).
Chunk configuration option echo = FALSE: don’t display chunk in output; echo = TRUE: display chunk.

knitr::opts_chunk$set(message = FALSE, # don't return messages
                      warning = FALSE, # don't return warnings
                      comment = NA, # don't comment output
                      echo = TRUE, # display chunk (is default)
                      eval = TRUE, # evaluate chunk (is default)
                      out.width = '45%',  # figure width
                      fig.align='center') # figure alignment

Section headers

# This is a section header

## This is a subsection header

# This is another section header

## This is another subsection header

### This is a subsubsection header

Figures

Create a new R chunk with label myscatterplot
Set echo = F cause we only need the figure.
Add a figure caption fig.cap = "A scatterplot." in the chunk configurations.

library(psyntur)
scatterplot(x = age, y = rt, data = blomkvist)

Or using ggplot2

ggplot(blomkvist, aes(x = age, y = rt)) +
    geom_point() +
    theme_classic()

Figures

Change the default size to out.width = 50%.
Cross-reference figure using \ref{fig:myscatterplot} in the text.

"A scatterplot of age and reaction time can be found in Figure \ref{fig:myscatterplot}."

Cross-referencing figures requires the use of pdf_document2 as output format. In preamble replace output with:

output: 
  bookdown::pdf_document2:
      toc: false          # removes the table of contents
      number_sections: false   # removes section numbering

You may need to install bookdown. We also turning off the table of content (toc) and the section numbers.

Add header options

header-includes:
- \usepackage{booktabs}

We just need booktabs to improve type setting.

Formatted tables

Reporting results in tables formatted to a high standard.
Calculate descriptives:

library(psyntur)
(smoker_age <- describe(data = blomkvist, by = smoker, mean = mean(age), sd = sd(age)))

# A tibble: 3 × 3
  smoker  mean    sd
  <chr>  <dbl> <dbl>
1 former  65.2  16.9
2 no      53.0  21.3
3 yes     50.6  17.5

This format isn’t good enough for papers.

Formatted APA style tables

library(kableExtra)
smoker_age %>% 
  kable(format = 'latex',
        booktabs = TRUE,
        digits = 2,
        align = 'c', # centre value in each column
        caption = 'Descriptives of age by smoker.') %>% 
  kable_styling(position = 'center') # centre position of table

Also, label your chunk “smoker” and cross-reference the table in the text using Table \ref{tab:smoker}.

Bibliography and citations

Add to the YAML preamble:

bibliography: refs.bib
biblio-style: apalike

Create a file (in RStudio) called refs.bib (save in same working directory as your .Rmd file)

Bibliography and citations

Get the .bib entry for Blomkvist et al. (2017) from Google Scholar and paste it into refs.bib:
- Copy the title “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board” into Google Scholar
- Click cite and BibTeX
- Copy the .bib entry into refs.bib
Note the citation key blomkvist2017reference
Cite Blomkvist et al. (2017) using @blomkvist2017reference or [@blomkvist2017reference].
At the end of your document create a section “# References”

Formatted inline R output

# Fit the model and get the summary
model <- lm(rt ~ sex, data = blomkvist)
model_summary <- summary(model)

# Extract R^2
r2 <- model_summary$r.sq

The $R^2$ for this model is `r round(r2, 2)`.

Renders “The $R^2$ for this model is 0.03.”

Formatted inline R output

# Extract F statistic
f_stat <- model_summary$fstatistic
p_value <- pf(f_stat[1], f_stat[2], f_stat[3], lower.tail = FALSE)

The model summary can be summarised like so: $F(`r round(f_stat[2])`, `r round(f_stat[3])`) =
`r round(f_stat[1],2)`$, $p `r format.pval(p_value, eps = 0.01)`$.

Renders “The model summary can be summarised like so: $F(1, 263) = 7.48$, $p <0.01$.”

p <- c(0.05, 0.02, 0.011, 0.005, 0.001)
format.pval(p, eps = 0.01)

[1] "0.05"  "0.02"  "0.01"  "<0.01" "<0.01"

Mathematical typesetting

Strings are parsed using $\LaTeX$ and typeset accordingly when used between '$' symbols for inline mode.
For example $\beta$ renders $\beta$.
Subscripts: $\beta_0$ is $\beta_0$ and using '{}' for more than one symbol as in $\beta_{01}$ which is $\beta_{01}$
How would you write $\beta_{0_1}$?
Superscripts: '^' as in $\sigma^2$ which is $\sigma^2$.
How would you write $\sigma^{2^2}$?
How about $\sigma_e^2$?

Some arithmetic operations and fractions

$x + y$ , $x - y$
Multiplication use either $\cdot$ or $\times$ to get $\cdot$ or $\times$, respectively, as in $3 \cdot 2$
Division: $/$ or $\div$ to get $/$ or $\div$, respectively, or $\frac{1}{2}$ for $\frac{1}{2}$
$\pm$ renders to $\pm$

For other formats …

install.packages("rmdformats")

install.packages("remotes")
remotes::install_github("juba/rmdformats")

Create from template:
- File > New File > R Markdown (e.g. readthedown or robobook for documents)
- rmdshower::shower_presentations and ioslides_presentation for slides
Rmarkdown example
Online sharing on RPubs (“Publish”)

APA7 manuscripts using `papaja`

Example: https://osf.io/vayhq/ published in Roeser et al. (2024)
Installation: https://github.com/crsh/papaja

Task (homework)

You will use RMarkdown for your assignment (get started now).
Create a list of continuous response variables that can be found in Psychology.
Find a suitable data set (should have a mix of continuous and categorical variables).
Use the RMarkdown we started and replace data with your own, calculate descriptives, and a visualisation.

References

Blomkvist, Andreas W., Fredrik Eika, Martin T. Rahbek, Karin D. Eikhof, Mette D. Hansen, Malene Søndergaard, Jesper Ryg, Stig Andersen, and Martin G. Jørgensen. 2017. “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board: A Cross-Sectional Study of 354 Subjects from 20 to 99 Years of Age.” PLoS One 12 (12): e0189598.

Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk VaWaes. 2024. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing 37 (2): 359–84. https://doi.org/10.1007/s11145-023-10489-4.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Xie, Yihui. 2017. Dynamic Documents with R and Knitr. Chapman; Hall/CRC.

Aims for today

How can you trust published results?

How can you trust published results?

Why do we need reproducible analyses?

A truely reproducible report

Why RMarkdown?

Key concepts and tools

Setup an R-project

Installation

From the RStudio menu …

Load data

The setup chunk: global options

Section headers

Figures

Figures

Add header options

Formatted tables

Formatted APA style tables

Bibliography and citations

Bibliography and citations

Formatted inline R output

Formatted inline R output

Mathematical typesetting

Some arithmetic operations and fractions

For other formats …

APA7 manuscripts using papaja

Task (homework)

Recommended reading

References

APA7 manuscripts using `papaja`