Aims

  • Sorting out your RStudio working environment.
  • Getting you ready to produce reproducible reports in RMarkdown.

Why is reproducibility important?

How can you trust someone’s results?

  • Results are communicated through reports such as peer reviewed scientific articles.
  • Reported results should be reproducible!
  • Data and code should be made available to others!
  • It is difficult to relate code to figures, tables, numbers etc. especially where numbers were copy-pasted from software.

Why do we need reproducible analyses?

  • Open Science Collaboration led by Brian Nosek (Open Science Collaboration 2015).
  • Aimed to replicate important findings in psychological research.
  • Main finding: 36% where replicated; only 23% for Social Psychology
  • Reasons: small samples, pressure to publish sensational results.
  • Solutions: pre-registration, replications, large samples, abandoning significance testing, transparency.

A truely reproducible report

  • Data, code, report need be one unit.
  • Based on the concept of literate programming (Knuth 1984): text and code are linked in one single file to generate manual or computer program.
  • This principle can be used to creating reproducible reports, sometimes known as dynamic documents (Xie 2017).
  • RMarkdown is the best way of doing this using R.

Why RMarkdown?

  • Documentation of all analysis steps
  • Quickly updating and reproducing analysis
  • No copy-pasting between programmes
  • Producing texts, website, supplementary materials, APA-style manuscripts, slides
  • Easy cross-referencing
  • Automatic reference lists
  • Easy type-setting of equations

What is RMarkdown?

  • Text file format, a script like an R script.
  • Open, edit, run in RStudio like scripts.
  • RMarkdown are a mixture of markdown code and normal R code (and your data).
  • R code in RMarkdown documents occurs in R chunks, i.e. blocks of code, or inline R code inside of markdown code
  • Markdown: minimal syntax that instructs how text should be formatted.
  • Rendering into .pdf, .html, .docx, .doc

Installation

  • Installation
  • RStudio comes with necessary R package rmarkdown
  • RStudio environment: Console, Help, Files, Packages, Environment, History
  • Install these two packages:
install.packages('tidyverse')
install.packages('psyntur')

Setup an R-project

  • What it is: A special folder for your work in R.
  • What it does: Keeps your scripts, data, and results together.
  • Why it helps:
    • Sets the folder as your working directory automatically.
    • No setwd() headaches.
    • Easy to share or move your whole project.
  • Works well with Git/GitHub for teamwork and version control.
  • Everything for one analysis stays neatly in one place.
  • Create R Project for psyc40940 with sub-directories: “data”, “exercises”, “plots”
  • Download blomkvist.csv from NOW and move it into the “data” folder.

RMarkdown example

From the RStudio menu …

  • Click on File > New File > R Markdown
  • Document (PDF, WORD, HTML)
  • output: html_document: we will use html for simplicity
  • Name your RMarkdown: basics.Rmd
  • Save in same directory as .Rproj file.
  • Compile (Knit) document
  • Inspect features of RMarkdown document: YAML preamble, text, markdown code (section headers, italics, bold font), chunks with R code, inline code.

R / RMarkdown / tidyverse basics

  • Executing code in chunks
  • Comments, commands
  • Using R as a calculator
  • Vectors and combining vectors (c())
  • Base-R functions: mean, sum, min, max, sd
  • Variables and assignment (<-)
  • Loading libraries (tidyverse, psyntur)
  • Data frames:
    • Reading in data using read_csv (next slide)
    • Creating a data frame via tibble
    • Datasets in packages: ?psyntur::faithfulfaces (many more in datasets)
  • tidyverse functions: select, filter, slice (indexing, e.g. [1]), pull, summarise

Load data

  • Create a new R chunk: CTRL+ALT+I or OPTION+ALT+I (i.e. the letter “i”) called packages and load libraries needed:
library(tidyverse)
library(psyntur)
  • Create a new chunk called loaddata and load the blomkvist.csv data published in Blomkvist et al. (2017)
# Load data
bk_alldata <- read_csv("data/blomkvist.csv") 

# Select a few variables
bk_selected <- select(blomkvist_alldata, id, smoker, age, rt = rt_hand_d)

# Remove missing data
blomkvist <- drop_na(bk_selected)

The setup chunk: global options

  • Note ```{r setup, echo=FALSE}
  • setup is a label of this chunk (optional; useful for cross-referencing of figures and tables).
  • Chunk configuration option echo = FALSE: don’t display chunk in output; echo = TRUE: display chunk.
knitr::opts_chunk$set(message = FALSE, # don't return messages
                      warning = FALSE, # don't return warnings
                      comment = NA, # don't comment output
                      echo = TRUE, # display chunk (is default)
                      eval = TRUE, # evaluate chunk (is default)
                      out.width = '45%',  # figure width
                      fig.align='center') # figure alignment

Figures

  • Create a new R chunk with label myscatterplot
  • Set echo = F cause we only need the figure.
  • Add a figure caption fig.cap = "A scatterplot." in the chunk configurations.
library(psyntur)
scatterplot(x = age, y = rt, data = blomkvist)

Figures

  • Change the default size to out.width = 75%.
  • Cross-reference figure using \@ref(fig:myscatterplot) in the text.
"A scatterplot of age and reaction time can be found in Figure \@ref(fig:myscatterplot)."

In the YAML preamble change

output: html_document

to

output: bookdown::html_document2

Formatted tables

  • Reporting results in tables formatted to a high standard.
  • Calculate descriptive statistics:
(smoker_rt <- summarise(blomkvist, mean = mean(rt), sd = sd(rt), .by = smoker))
# A tibble: 3 × 3
  smoker  mean    sd
  <chr>  <dbl> <dbl>
1 former  653.  152.
2 no      635.  203.
3 yes     633.  217.
library(knitr)
kable(smoker_age,
      booktabs = TRUE,
      digits = 2,
      align = 'c', # centre value in each column
      caption = 'Descriptives of age by smoker.') 
  • Label the chunk “smoker” and cross-reference the table in the text using Table \@ref(tab:smoker).

Bibliography and citations

  • Create a file (in RStudio) and call it references.bib (save in same working directory as your .Rmd file)
  • Get the .bib entry for Blomkvist et al. (2017) from Google Scholar and paste it into references.bib:
    • Copy the title “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board” into Google Scholar
    • Click cite and BibTeX
    • Copy object into references.bib
  • Note the citation key blomkvist2017reference
  • Cite Blomkvist et al. (2017) using @blomkvist2017reference or [@blomkvist2017reference].
  • Create a section “# References
  • Add to the YAML preamble:
bibliography: references.bib
biblio-style: apalike

Formatted inline R output

# Fit the model and get the summary
model <- lm(rt ~ sex, data = blomkvist)
model_summary <- summary(model)
# Extract R^2
r2 <- model_summary$r.sq
The $R^2$ for this model is `r round(r2, 2)`.

Renders “The \(R^2\) for this model is 0.03.”

Formatted inline R output

# Extract F statistic
f_stat <- model_summary$fstatistic
p_value <- pf(f_stat[1], f_stat[2], f_stat[3], lower.tail = FALSE)
The model summary can be summarised like so: $F(`r round(f_stat[2])`, `r round(f_stat[3])`) =
`r round(f_stat[1],2)`$, $p `r format.pval(p_value, eps = 0.01)`$.

Renders “The model summary can be summarised like so: \(F(1, 263) = 7.48\), \(p <0.01\).”

p <- c(0.05, 0.02, 0.011, 0.005, 0.001)
format.pval(p, eps = 0.01)
[1] "0.05"  "0.02"  "0.01"  "<0.01" "<0.01"

Mathematical typesetting

  • Strings are parsed using \(\LaTeX\) and typeset accordingly when used between '$' symbols for inline mode.
  • For example $\beta$ renders \(\beta\).
  • Subscripts: $\beta_0$ is \(\beta_0\) and using '{}' for more than one symbol as in $\beta_{01}$ which is \(\beta_{01}\)
  • How would you write \(\beta_{0_1}\)?
  • Superscripts: '^' as in $\sigma^2$ which is \(\sigma^2\).
  • How would you write \(\sigma^{2^2}\)?
  • How about \(\sigma_e^2\)?

Some arithmetic operations and fractions

  • $x + y$, $x - y$
  • Multiplication use either $\cdot$ or $\times$ to get \(\cdot\) or \(\times\), respectively, as in \(3 \cdot 2\)
  • Division: $/$ or $\div$ to get \(/\) or \(\div\), respectively, or $\frac{1}{2}$ for \(\frac{1}{2}\)
  • $\pm$ renders to \(\pm\)

For other formats …

install.packages("rmdformats")
  • Create from template:
    • File > New File > R Markdown (e.g. readthedown or robobook for documents)
    • rmdshower::shower_presentations and ioslides_presentation for slides
  • RMarkdown example
  • Online sharing on RPubs (“Publish”)

APA7 manuscripts using papaja

Homework

References

Andrews, Mark. 2021. Doing Data Science in R: An Introduction for Social Scientists. SAGE Publications Ltd.

Blomkvist, Andreas W., Fredrik Eika, Martin T. Rahbek, Karin D. Eikhof, Mette D. Hansen, Malene Søndergaard, Jesper Ryg, Stig Andersen, and Martin G. Jørgensen. 2017. “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board: A Cross-Sectional Study of 354 Subjects from 20 to 99 Years of Age.” PLoS One 12 (12): e0189598.

Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk VaWaes. 2024. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing 37 (2): 359–84. https://doi.org/10.1007/s11145-023-10489-4.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Xie, Yihui. 2017. Dynamic Documents with R and Knitr. Chapman; Hall/CRC.