PSYC40940: Reproducible data visualisation using RMarkdown

Aims

Sorting out your RStudio working environment.
Getting you ready to produce reproducible reports in RMarkdown.

Why is reproducibility important?

How can you trust someone’s results?

Results are communicated through reports such as peer reviewed scientific articles.
Reported results should be reproducible!
Data and code should be made available to others!
It is difficult to relate code to figures, tables, numbers etc. especially where numbers were copy-pasted from software.

Why do we need reproducible analyses?

Open Science Collaboration led by Brian Nosek (Open Science Collaboration 2015).
Aimed to replicate important findings in psychological research.
Main finding: 36% where replicated; only 23% for Social Psychology
Reasons: small samples, pressure to publish sensational results.
Solutions: pre-registration, replications, large samples, abandoning significance testing, transparency.

A truely reproducible report

Data, code, report need be one unit.
Based on the concept of literate programming (Knuth 1984): text and code are linked in one single file to generate manual or computer program.
This principle can be used to creating reproducible reports, sometimes known as dynamic documents (Xie 2017).
RMarkdown is the best way of doing this using R.

Why RMarkdown?

Documentation of all analysis steps
Quickly updating and reproducing analysis
No copy-pasting between programmes
Producing texts, website, supplementary materials, APA-style manuscripts, slides
Easy cross-referencing
Automatic reference lists
Easy type-setting of equations

What is RMarkdown?

Text file format, a script like an R script.
Open, edit, run in RStudio like scripts.
RMarkdown are a mixture of markdown code and normal R code (and your data).
R code in RMarkdown documents occurs in R chunks, i.e. blocks of code, or inline R code inside of markdown code
Markdown: minimal syntax that instructs how text should be formatted.
Rendering into .pdf, .html, .docx, .doc

Installation

Installation
RStudio comes with necessary R package rmarkdown
RStudio environment: Console, Help, Files, Packages, Environment, History

Install these two packages:

install.packages('tidyverse')
install.packages('psyntur')

Setup an R-project

What it is: A special folder for your work in R.
What it does: Keeps your scripts, data, and results together.
Why it helps:
- Sets the folder as your working directory automatically.
- No setwd() headaches.
- Easy to share or move your whole project.
Works well with Git/GitHub for teamwork and version control.
Everything for one analysis stays neatly in one place.
Create R Project for psyc40940 with sub-directories: “data”, “exercises”, “plots”
Download blomkvist.csv from NOW and move it into the “data” folder.

RMarkdown example

From the RStudio menu …

Click on File > New File > R Markdown
Document (PDF, WORD, HTML)
output: html_document: we will use html for simplicity
Name your RMarkdown: basics.Rmd
Save in same directory as .Rproj file.
Compile (Knit) document
Inspect features of RMarkdown document: YAML preamble, text, markdown code (section headers, italics, bold font), chunks with R code, inline code.

R / RMarkdown / `tidyverse` basics

Executing code in chunks
Comments, commands
Using R as a calculator
Vectors and combining vectors (c())
Base-R functions: mean, sum, min, max, sd
Variables and assignment (<-)
Loading libraries (tidyverse, psyntur)
Data frames:
- Reading in data using read_csv (next slide)
- Creating a data frame via tibble
- Datasets in packages: ?psyntur::faithfulfaces (many more in datasets)
tidyverse functions: select, filter, slice (indexing, e.g. [1]), pull, summarise

Load data

Create a new R chunk: CTRL+ALT+I or OPTION+ALT+I (i.e. the letter “i”) called packages and load libraries needed:

library(tidyverse)
library(psyntur)

Create a new chunk called loaddata and load the blomkvist.csv data published in Blomkvist et al. (2017)

# Load data
bk_alldata <- read_csv("data/blomkvist.csv") 

# Select a few variables
bk_selected <- select(blomkvist_alldata, id, smoker, age, rt = rt_hand_d)

# Remove missing data
blomkvist <- drop_na(bk_selected)

The setup chunk: global options

Note ```{r setup, echo=FALSE}
setup is a label of this chunk (optional; useful for cross-referencing of figures and tables).
Chunk configuration option echo = FALSE: don’t display chunk in output; echo = TRUE: display chunk.

knitr::opts_chunk$set(message = FALSE, # don't return messages
                      warning = FALSE, # don't return warnings
                      comment = NA, # don't comment output
                      echo = TRUE, # display chunk (is default)
                      eval = TRUE, # evaluate chunk (is default)
                      out.width = '45%',  # figure width
                      fig.align='center') # figure alignment

Figures

Create a new R chunk with label myscatterplot
Set echo = F cause we only need the figure.
Add a figure caption fig.cap = "A scatterplot." in the chunk configurations.

library(psyntur)
scatterplot(x = age, y = rt, data = blomkvist)

Figures

Change the default size to out.width = 75%.
Cross-reference figure using \@ref(fig:myscatterplot) in the text.

"A scatterplot of age and reaction time can be found in Figure \@ref(fig:myscatterplot)."

In the YAML preamble change

output: html_document

output: bookdown::html_document2

Formatted tables

Reporting results in tables formatted to a high standard.
Calculate descriptive statistics:

(smoker_rt <- summarise(blomkvist, mean = mean(rt), sd = sd(rt), .by = smoker))

# A tibble: 3 × 3
  smoker  mean    sd
  <chr>  <dbl> <dbl>
1 former  653.  152.
2 no      635.  203.
3 yes     633.  217.

library(knitr)
kable(smoker_age,
      booktabs = TRUE,
      digits = 2,
      align = 'c', # centre value in each column
      caption = 'Descriptives of age by smoker.')

Label the chunk “smoker” and cross-reference the table in the text using Table \@ref(tab:smoker).

Bibliography and citations

Create a file (in RStudio) and call it references.bib (save in same working directory as your .Rmd file)
Get the .bib entry for Blomkvist et al. (2017) from Google Scholar and paste it into references.bib:
- Copy the title “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board” into Google Scholar
- Click cite and BibTeX
- Copy object into references.bib
Note the citation key blomkvist2017reference
Cite Blomkvist et al. (2017) using @blomkvist2017reference or [@blomkvist2017reference].
Create a section “# References”
Add to the YAML preamble:

bibliography: references.bib
biblio-style: apalike

Formatted inline R output

# Fit the model and get the summary
model <- lm(rt ~ sex, data = blomkvist)
model_summary <- summary(model)

# Extract R^2
r2 <- model_summary$r.sq

The $R^2$ for this model is `r round(r2, 2)`.

Renders “The $R^2$ for this model is 0.03.”

Formatted inline R output

# Extract F statistic
f_stat <- model_summary$fstatistic
p_value <- pf(f_stat[1], f_stat[2], f_stat[3], lower.tail = FALSE)

The model summary can be summarised like so: $F(`r round(f_stat[2])`, `r round(f_stat[3])`) =
`r round(f_stat[1],2)`$, $p `r format.pval(p_value, eps = 0.01)`$.

Renders “The model summary can be summarised like so: $F(1, 263) = 7.48$, $p <0.01$.”

p <- c(0.05, 0.02, 0.011, 0.005, 0.001)
format.pval(p, eps = 0.01)

[1] "0.05"  "0.02"  "0.01"  "<0.01" "<0.01"

Mathematical typesetting

Strings are parsed using $\LaTeX$ and typeset accordingly when used between '$' symbols for inline mode.
For example $\beta$ renders $\beta$.
Subscripts: $\beta_0$ is $\beta_0$ and using '{}' for more than one symbol as in $\beta_{01}$ which is $\beta_{01}$
How would you write $\beta_{0_1}$?
Superscripts: '^' as in $\sigma^2$ which is $\sigma^2$.
How would you write $\sigma^{2^2}$?
How about $\sigma_e^2$?

Some arithmetic operations and fractions

$x + y$ , $x - y$
Multiplication use either $\cdot$ or $\times$ to get $\cdot$ or $\times$, respectively, as in $3 \cdot 2$
Division: $/$ or $\div$ to get $/$ or $\div$, respectively, or $\frac{1}{2}$ for $\frac{1}{2}$
$\pm$ renders to $\pm$

For other formats …

install.packages("rmdformats")

Create from template:
- File > New File > R Markdown (e.g. readthedown or robobook for documents)
- rmdshower::shower_presentations and ioslides_presentation for slides
RMarkdown example
Online sharing on RPubs (“Publish”)

APA7 manuscripts using `papaja`

Example: https://osf.io/vayhq/ published in Roeser et al. (2024)
Installation: https://github.com/crsh/papaja

Homework

Next week we will start with data visualisation, so make sure you understand RMarkdown and R code.

See materials on R programming and RMarkdown in Week 1 NOW folder.
For an introduction to R programming read Chapter 2 of Andrews (2021).
For a more information about RMarkdown (also how to create APA7 style documents) read Chapter 7 of Andrews (2021)
And chapter 27 of Wickham and Grolemund (2016).
Check the rmarkdown website.

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Blomkvist, Andreas W., Fredrik Eika, Martin T. Rahbek, Karin D. Eikhof, Mette D. Hansen, Malene Søndergaard, Jesper Ryg, Stig Andersen, and Martin G. Jørgensen. 2017. “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board: A Cross-Sectional Study of 354 Subjects from 20 to 99 Years of Age.” PLoS One 12 (12): e0189598.

Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk VaWaes. 2024. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing 37 (2): 359–84. https://doi.org/10.1007/s11145-023-10489-4.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Xie, Yihui. 2017. Dynamic Documents with R and Knitr. Chapman; Hall/CRC.

Aims

Why is reproducibility important?

How can you trust someone’s results?

Why do we need reproducible analyses?

A truely reproducible report

Why RMarkdown?

What is RMarkdown?

Installation

Setup an R-project

RMarkdown example

R / RMarkdown / tidyverse basics

Load data

The setup chunk: global options

Figures

Figures

Formatted tables

Bibliography and citations

Formatted inline R output

Formatted inline R output

Mathematical typesetting

Some arithmetic operations and fractions

For other formats …

APA7 manuscripts using papaja

Homework

References

R / RMarkdown / `tidyverse` basics

APA7 manuscripts using `papaja`