About me: Jens Roeser

  • Senior Lecturer in Psycholinguistics @ Psychology Department
  • Language production / comprehension / acquisition often with a focus on writing (e.g. Roeser, Torrance, and Baguley 2019; Garcia, Roeser, and Kidd 2023)
  • Bayesian modelling of production time course data (Roeser, De Maeyer, et al. 2024; Roeser, Conijn, et al. 2024); keystroke logging; eyetracking
  • Teaching: advanced statistical modelling, data wrangling, data visualisation, R package (psyntur, Andrews and Roeser 2021)

Reproducible data analysis using RMarkdown

How can you trust someone’s results?

  • Results are communicated through reports such as peer reviewed scientific articles.
  • Analysis and results should be reproducible.
  • Data and code should be made publically available.
  • Often difficult to relate code to figures, tables, numbers in text.
  • Copy-pasting numbers from software into the text can be error prone.

Why do we need reproducible analyses?

  • Open Science Collaboration led by Brian Nosek (Open Science Collaboration 2015).
  • Aimed to replicate important findings in psychological research.
  • Main finding: 36% where replicated; only 23% for Social Psychology
  • Reasons: small samples, pressure to publish sensational results.
  • Make Psychology Science Again: pre-registration, replications, transparency, large samples (power analysis), abandoning significance testing.

Markdown: a truely reproducible report

  • Data, code, report need be one unit.
  • Based on the concept of literate programming (Knuth 1984): text and code are linked in one single file to generate manual or computer program.
  • This principle can be used to creating reproducible reports, sometimes known as dynamic documents (Xie 2017).
  • RMarkdown is the best way of doing this using R.

Why RMarkdown?

  • Documentation of all analysis steps
  • Easy integration between R and text
  • Quickly updating and reproducing analysis
  • No copy-pasting between programmes
  • Producing texts, website, supplementary materials, APA-style manuscripts, slides
  • Easy cross-referencing (citations)
  • Automatic reference lists
  • Easy type-setting of equations

Key concepts and tools

What is RMarkdown?

  • Text file format, a script like an R script.
  • Open, edit, run in RStudio like scripts.
  • RMarkdown are a mixture of markdown code and normal R code (and your data).
  • R code in RMarkdown documents occurs in R chunks, i.e. blocks of code, or inline R code inside of markdown code
  • Markdown: minimal syntax that instructs how text should be formatted.
  • Rendering into .pdf, .html, .docx, .doc

Key concepts and tools

  • Document preparation system for creating high quality technical and scientific documents especially when involving mathematical formulas and technical diagrams.
  • Great for cross referencing (citations, figure labels) and automatic generation of content tables, reference lists.
  • Widely used in statistics, computer science, physics
  • \(\LaTeX\) documents are written in a .tex source code file and rendered to pdf

Outline

  • Setup
  • Examples
  • YAML preamble and chunk options
  • Headers
  • Figures and tables
  • Cross-referencing
  • Referencing
  • In-line R output
  • Mathematical typesetting

Download repository

  • Go to github.com/jensroes/bristol-ws-2025
  • Click on: Code > Download ZIP > unzip directory on your machine.
  • Open project by double clicking on bristol-ws-2025.Rproj
  • Use slides for code

Installation

  • RStudio comes with necessary R package rmarkdown
  • To create pdf outputs, we require the package tinytex
# Is tidyverse installed?
'tidyverse' %in% rownames(installed.packages())
# Should return TRUE but if not run
install.packages('tidyverse')
install.packages('tinytex')
tinytex::install_tinytex()
# then restart RStudio
# Then, try
tinytex:::is_tinytex() # should return TRUE

Installation

  • Test that rmarkdown will render pdf documents:
writeLines('Hello $x^2$', 'test.Rmd')
rmarkdown::render('test.Rmd', output_format = 'pdf_document')
  • writeLines creates an .Rmd file named test.Rmd.
  • rmarkdown::render renders .Rmd as pdf named test.pdf.

Minimal RMarkdown example

  • Open rmarkdown/examples/example.Rmd
  • You’ll see an R chunk and two pieces of inline R code.
  • Remainder is plain Markdown.
  • Knit > Knit to PDF to compile .Rmd to .pdf. Wow!!!
  • Notice how the code was interpreted in pdf.
  • Important: R code is executed from top down.
  • Other demos and examples are provided.

Elaborate APA7 RMarkdown example

  • Open rmarkdown/examples/apa7_template.Rmd
  • To run this, you might need to install a few packages so instead
  • Check out the file rmarkdown/examples/apa7_template.pdf that was rendered with this .Rmd file

Example for online supp materials

  • Open rmarkdown/examples/robobook.Rmd
  • To run this, you might need to install a few packages so instead
  • Check out the file rmarkdown/examples/robobook.html that was rendered with this .Rmd file

From the RStudio menu …

  • Click on File > New File > R Markdown
  • Document (PDF, WORD, HTML)
  • output: pdf_document: use pdfs for writing manuscripts
  • Name your RMarkdown: my-practice-markdown.Rmd
  • Save in same directory as .Rproj file.
  • YAML header between '---'s
  • YAML = YAML Ain’t a Markup Language (o Yet Another Markup Language)

Load data

  • Create a new R chunk: try CTRL+ALT+I or CMD+ALT+I (i.e. the letter “i”)
  • Create a chunk called packages and load libraries needed:
library(tidyverse)
  • Create a new chunk called loaddata and load Martin et al. (2010) data:
data <- read_csv("data/martin-etal-2010-exp3a.csv") 

The setup chunk: global options

  • Note ```{r setup, echo=FALSE}
  • setup is a label of this chunk (optional; useful for cross-referencing of figures and tables).
  • Chunk configuration option echo = FALSE: don’t display chunk in output; echo = TRUE: display chunk.
knitr::opts_chunk$set(message = FALSE, # don't return messages
                      warning = FALSE, # don't return warnings
                      comment = NA, # don't comment output
                      echo = TRUE, # display chunk (is default)
                      eval = TRUE, # evaluate chunk (is default)
                      out.width = '45%',  # figure width
                      fig.align='center') # figure alignment

Section headers

# This is a section header

## This is a subsection header

# This is another section header

## This is another subsection header

### This is a subsubsection header

Figures

  • Create a new R chunk with label mydensplot
  • Set echo = F cause we only need the figure.
  • Add a figure caption fig.cap = "A density plot." in the chunk configurations.
ggplot(data, aes(x = rt, colour = nptype, fill = nptype)) +
    geom_density(alpha = .25) +
    scale_x_log10() +
    theme_classic()

Figures

  • Change the default size to out.width = 50%.
  • Cross-reference figure using \ref{fig:mydensplot} in the text.
"A density plot of reaction time visualised by NP type can be found in Figure \ref{fig:mydensplot}."

Add header options

header-includes:
- \usepackage{booktabs}
  • We just need booktabs to improve type setting.

Formatted tables

  • Reporting results in tables formatted to a high standard.
  • Calculate descriptive summary stats:
(rt_stats <- summarise(data, across(rt, list(mean = mean, sd = sd)), .by = nptype))
# A tibble: 2 × 3
  nptype    rt_mean rt_sd
  <chr>       <dbl> <dbl>
1 conjoined   1109.  296.
2 simple      1076.  267.
  • This format isn’t good enough for papers.

Formatted APA style tables

library(kableExtra)
kable(rt_stats,
      format = 'latex',
      booktabs = TRUE,
      digits = 2,
      align = 'c', # centre value in each column
      caption = 'Descriptive summary statistics of reaction time by NP type.') %>% 
  kable_styling(position = 'center') # centre position of table

Also, label your chunk “rtstats” and cross-reference the table in the text using Table \ref{tab:rtstats}.

Bibliography and citations

Add to the YAML preamble:

bibliography: refs.bib
biblio-style: apalike

Create a file (in RStudio) called refs.bib (save in same working directory as your .Rmd file)

Bibliography and citations

  • Get the .bib entry for Martin et al. (2010) from Google Scholar and paste it into refs.bib:
    • Copy the title “Planning in sentence production: Evidence for the phrase as a default planning scope” into Google Scholar
    • Click cite and BibTeX
    • Copy the .bib entry into refs.bib
  • Note the citation key martin2010planning
  • Cite Martin et al. (2010) using @martin2010planning or [@martin2010planning].
  • At the end of your document create a section “# References

Table exercise

# Fit model and get the summary
library(lmerTest)
model <- lmer(rt ~ nptype + (nptype|ppt) + (nptype|item), data = data)
model_summary <- summary(model)$coefficients

Task: Use the tab_model function of the sjPlot package to create a nicely formatted table of the model object. Use cross-referencing as before to refer to the table from the text.

Formatted inline R output

# Extract t statistic
t_val <- model_summary[2, 4]
# Extract df
df <- model_summary[2, 3]
# Extract p value
p_val <- model_summary[2, 5]
The hypothesis test for the rt effect can be summarised like so: 
*t*(`r round(df, 1)`) = `r round(t_val, 1)`, *p* `r format.pval(p_val, eps = 0.05)`.

Which renders to: “The hypothesis test for the slope coefficient can be summarised like so: t(125.7) = -2.2, p <0.05.”

Formatted inline R output

# Extract slope coefficient
est <- model_summary[2, 1]
The $\hat\beta$ coefficient of this model shows a slowdown of `r round(est, 1)` ms.

Renders “The \(\hat\beta\) coefficient of this model shows a slowdown of -33.6 ms.”

Task: Use confint(model) to obtain the 95% CI of the effect and add it to the text. Don’t copy paste the numbers but use inline R code.

Formatted inline R output

# Extract slope coefficient
est <- model_summary[2, 1]
ci <- as.numeric(confint(model, parm = "nptypesimple"))
The $\hat\beta$ coefficient of this model shows a slowdown of `r round(est, 1)` ms;
95% CI: `r ci[1]`, `r ci[2]`.

Renders “The \(\hat\beta\) coefficient of this model shows a slowdown of -33.6 ms; 95% CI: -63.83, -2.94.”

Mathematical typesetting

  • Strings are parsed using \(\LaTeX\) and typeset accordingly when used between '$' symbols for inline mode.
  • For example $\beta$ renders \(\beta\).
  • Subscripts: $\beta_0$ is \(\beta_0\) and using '{}' for more than one symbol as in $\beta_{01}$ which is \(\beta_{01}\)
  • How would you write \(\beta_{0_1}\)?
  • Superscripts: '^' as in $\sigma^2$ which is \(\sigma^2\).
  • How would you write \(\sigma^{2^2}\)?
  • How about \(\sigma_e^2\)?

Some arithmetic operations and fractions

  • $x + y$, $x - y$
  • Multiplication use either $\cdot$ or $\times$ to get \(\cdot\) or \(\times\), respectively, as in \(3 \cdot 2\)
  • Division: $/$ or $\div$ to get \(/\) or \(\div\), respectively, or $\frac{1}{2}$ for \(\frac{1}{2}\)
  • $\pm$ renders to \(\pm\)

\(\LaTeX\) mathematical typesetting

The observed variable is modeled as coming from a normal distribution $y_i \sim \mathcal{N}(\mu_i, \sigma^2)$, with $\mu_i = \beta_0 + \beta_1 \cdot x_i$ where each $i \in 1 \dots N$ .

Renders

The observed variable is modeled as coming from a normal distribution \(y_i \sim \mathcal{N}(\mu_i, \sigma^2)\), with \(\mu_i = \beta_0 + \beta_1 \cdot x_i\) where each \(i \in 1 \dots N\).

Display mode using '$$' as delimiters

$$
y_i \sim \mathcal{N}(\mu_i, \sigma^2)
$$

Renders

\[ y_i \sim \mathcal{N}(\mu_i, \sigma^2) \]

Display maths

  • Using \(\LaTeX\)’s aligned environment to align multiple mathematical statements.
  • We will get back to what this means.
$$
\begin{aligned}
  y_i &\sim N(\mu_i, \sigma^2),\\
  \mu_i &= \beta_0 + \beta_1 \cdot x_i + beta_2 \cdot z_i
\end
$$
  • The '&' is used to align the lines so that ‘\(\sim\)’ (\sim) and ‘\(=\)’ are aligned.
  • '$\\$' forces a line break.

For other formats …

install.packages("rmdformats")
  • Create from template:
    • File > New File > R Markdown (e.g. readthedown or robobook for documents)
    • rmdshower::shower_presentations and ioslides_presentation for slides
  • Online sharing on RPubs (“Publish”): RMarkdown example

APA7 manuscripts using papaja

Tips / Rules

  • Never copy numbers from output into text.
  • Don’t produce citations or cross-reference manually.
  • Be selective with which output and code you do and do not show your audience.
  • Don’t use RMarkdown instead of or like R-scripts.
  • Run more complex statistical models in a separate script, save the model output and read it into your script.

Recommended reading

References

Andrews, Mark. 2021. Doing data science in R: An Introduction for Social Scientists. London, UK: SAGE Publications Ltd.

Andrews, Mark, and Jens Roeser. 2021. psyntur: Helper Tools for Teaching Statistical Data Analysis. https://CRAN.R-project.org/package=psyntur.

Garcia, Rowena, Jens Roeser, and Evan Kidd. 2023. “Finding Your Voice: Voice-Specific Effects in Tagalog Reveal the Limits of Word Order Priming.” Cognition 236: 105424.

Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.

Martin, Randi C, Jason E Crowther, Meredith Knight, Franklin P Tamborello II, and Chin-Lung Yang. 2010. “Planning in Sentence Production: Evidence for the Phrase as a Default Planning Scope.” Cognition 116 (2): 177–92.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Roeser, Jens, Rianne Conijn, Evgeny Chukharev, Gunn Helen Ofstad, and Mark Torrance. 2024. “Typing in Tandem: Language Planning in Multi-Sentence Text Production Is Fundamentally Parallel.” Journal of Experimental Psychology: General.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk Van Waes. 2024. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing 37 (2): 359–84.

Roeser, Jens, Mark Torrance, and Thom Baguley. 2019. “Advance Planning in Written and Spoken Sentence Production.” Journal of Experimental Psychology: Learning, Memory, and Cognition 45 (11): 1983–2009.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Xie, Yihui. 2017. Dynamic Documents with R and Knitr. Chapman; Hall/CRC.