Aims for the rest of today

  • Sorting out your RStudio working environment.
  • Getting you ready to write reproducible reports in RMarkdown.

How can you trust someone’s results?

  • Results are communicated through reports such as peer reviewed scientific articles.
  • Reported results should be reproducible! – Are they?
  • Data and code should be made available to others – Is this enough though?
  • It is difficult to relate code to figures, tables, numbers etc. especially where numbers were copy-pasted from software.
  • Also manual processing of data and generation of reports is error prone.

Why do we need reproducible analyses?

  • Open Science Collaboration led by Brian Nosek (Open Science Collaboration 2015).
  • Aimed to replicate important findings in psychological research.
  • Main finding: 36% where replicated; only 23% for Social Psychology
  • Reasons: small samples, pressure to publish sensational results.
  • Make Psychology Science Again: pre-registration, replications, large samples (power analysis), abandoning significance testing, transparency.

A truely reproducible report

  • Data, code, report need be one unit.
  • Based on the concept of literate programming (Knuth 1984): text and code are linked in one single file to generate manual or computer program.
  • This principle can be used to creating reproducible reports, sometimes known as dynamic documents (Xie 2017).
  • RMarkdown is the best way of doing this using R.

Why RMarkdown?

  • Documentation of all analysis steps
  • Quickly updating and reproducing analysis
  • No copy-pasting between programmes
  • Producing texts, website, supplementary materials, APA-style manuscripts, slides
  • Easy cross-referencing
  • Automatic reference lists
  • Easy type-setting of equations

Key concepts and tools

What is RMarkdown?

  • Text file format, a script like an R script.
  • Open, edit, run in RStudio like scripts.
  • RMarkdown are a mixture of markdown code and normal R code (and your data).
  • R code in RMarkdown documents occurs in R chunks, i.e. blocks of code, or inline R code inside of markdown code
  • Markdown: minimal syntax that instructs how text should be formatted.
  • Rendering into .pdf, .html, .docx, .doc

Download repository

  • Go to github.com/jensroes/stats-iii
  • Click on: Code > Download ZIP > unzip directory on your machine.
  • Open project by double clicking on stats-iii.Rproj
    • xxx/slides.Rmd: slides in RMarkdown format
    • xxx/exercises: R scripts that we will work with
    • data/: my scripts will read in data from here

Installation

  • RStudio comes with necessary R package rmarkdown
  • To create pdf outputs, we require the package tinytex
# Is tidyverse installed?
'tidyverse' %in% rownames(installed.packages())
# Should return TRUE but if not run
install.packages('tidyverse')
install.packages('tinytex')
tinytex::install_tinytex()
# then restart RStudio
# Then, try
tinytex:::is_tinytex() # should return TRUE

Installation

  • Test that rmarkdown will render pdf documents:
writeLines('Hello $x^2$', 'test.Rmd')
rmarkdown::render('test.Rmd', output_format = 'pdf_document')
  • writeLines creates an .Rmd file named test.Rmd.
  • rmarkdown::render renders .Rmd as pdf named test.pdf.

Minimal RMarkdown example

  • Open rmarkdown/examples/example.Rmd
  • You’ll see an R chunk and two pieces of inline R code.
  • Remainder is plain Markdown.
  • Knit > Knit to PDF to compile .Rmd to .pdf. Wow!!!
  • Notice how the code was interpreted in pdf.
  • Important: R code is executed from top down.
  • Other demos and examples are provided.

From the RStudio menu …

  • Click on File > New File > R Markdown
  • Document (PDF, WORD, HTML)
  • output: pdf_document: use pdfs for writing manuscripts
  • Name your RMarkdown: fa-linear-regression.Rmd
  • Save in same directory as .Rproj file.

Load data

  • Create a new R chunk: try CTRL+ALT+I or CMD+ALT+I (i.e. the letter “i”)
  • Create a chunk called packages and load libraries needed:
library(tidyverse)
  • Create a new chunk called loaddata and load Blomkvist et al. (2017) data:
blomkvist <- read_csv("data/blomkvist.csv") %>% 
  select(id, age, smoker, sex, rt = rt_hand_d) %>% 
  drop_na()

The setup chunk: global options

  • Note ```{r setup, echo=FALSE}
  • setup is a label of this chunk (optional; useful for cross-referencing of figures and tables).
  • Chunk configuration option echo = FALSE: don’t display chunk in output; echo = TRUE: display chunk.
knitr::opts_chunk$set(message = FALSE, # don't return messages
                      warning = FALSE, # don't return warnings
                      comment = NA, # don't comment output
                      echo = TRUE, # display chunk (is default)
                      eval = TRUE, # evaluate chunk (is default)
                      out.width = '45%',  # figure width
                      fig.align='center') # figure alignment

Section headers

# This is a section header

## This is a subsection header

# This is another section header

## This is another subsection header

### This is a subsubsection header

Figures

  • Create a new R chunk with label myscatterplot
  • Set echo = F cause we only need the figure.
  • Add a figure caption fig.cap = "A scatterplot." in the chunk configurations.
library(psyntur)
scatterplot(x = age, y = rt, data = blomkvist)
  • Or using ggplot2
ggplot(blomkvist, aes(x = age, y = rt)) +
    geom_point() +
    theme_classic()

Figures

  • Change the default size to out.width = 50%.
  • Cross-reference figure using \ref{fig:myscatterplot} in the text.
"A scatterplot of age and reaction time can be found in Figure \ref{fig:myscatterplot}."

Add header options

header-includes:
- \usepackage{booktabs}
  • We just need booktabs to improve type setting.

Formatted tables

  • Reporting results in tables formatted to a high standard.
  • Calculate descriptives:
library(psyntur)
(smoker_age <- describe(data = blomkvist, by = smoker, mean = mean(age), sd = sd(age)))
# A tibble: 3 × 3
  smoker  mean    sd
  <chr>  <dbl> <dbl>
1 former  65.2  16.9
2 no      53.0  21.3
3 yes     50.6  17.5
  • This format isn’t good enough for papers.

Formatted APA style tables

library(kableExtra)
smoker_age %>% 
  kable(format = 'latex',
        booktabs = TRUE,
        digits = 2,
        align = 'c', # centre value in each column
        caption = 'Descriptives of age by smoker.') %>% 
  kable_styling(position = 'center') # centre position of table
  • Also, label your chunk “smoker” and cross-reference the table in the text using Table \ref{tab:smoker}.

Bibliography and citations

  • Add to the YAML preamble:
bibliography: refs.bib
biblio-style: apalike
  • Create a file (in RStudio) called refs.bib (save in same working directory as your .Rmd file)

Bibliography and citations

  • Get the .bib entry for Blomkvist et al. (2017) from Google Scholar and paste it into refs.bib:
    • Copy the title “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board” into Google Scholar
    • Click cite and BibTeX
    • Copy the .bib entry into refs.bib
  • Note the citation key blomkvist2017reference
  • Cite Blomkvist et al. (2017) using @blomkvist2017reference or [@blomkvist2017reference].
  • At the end of your document create a section “# References

Formatted inline R output

# Fit the model and get the summary
model <- lm(rt ~ sex, data = blomkvist)
model_summary <- summary(model)
# Extract R^2
r2 <- model_summary$r.sq
The $R^2$ for this model is `r round(r2, 2)`.

Renders “The \(R^2\) for this model is 0.03.”

Formatted inline R output

# Extract F statistic
f_stat <- model_summary$fstatistic
p_value <- pf(f_stat[1], f_stat[2], f_stat[3], lower.tail = FALSE)
The model summary can be summarised like so: $F(`r round(f_stat[2])`, `r round(f_stat[3])`) =
`r round(f_stat[1],2)`$, $p `r format.pval(p_value, eps = 0.01)`$.

Renders “The model summary can be summarised like so: \(F(1, 263) = 7.48\), \(p <0.01\).”

p <- c(0.05, 0.02, 0.011, 0.005, 0.001)
format.pval(p, eps = 0.01)
[1] "0.05"  "0.02"  "0.01"  "<0.01" "<0.01"

Mathematical typesetting

  • Strings are parsed using \(\LaTeX\) and typeset accordingly when used between '$' symbols for inline mode.
  • For example $\beta$ renders \(\beta\).
  • Subscripts: $\beta_0$ is \(\beta_0\) and using '{}' for more than one symbol as in $\beta_{01}$ which is \(\beta_{01}\)
  • How would you write \(\beta_{0_1}\)?
  • Superscripts: '^' as in $\sigma^2$ which is \(\sigma^2\).
  • How would you write \(\sigma^{2^2}\)?
  • How about \(\sigma_e^2\)?

Some arithmetic operations and fractions

  • $x + y$, $x - y$
  • Multiplication use either $\cdot$ or $\times$ to get \(\cdot\) or \(\times\), respectively, as in \(3 \cdot 2\)
  • Division: $/$ or $\div$ to get \(/\) or \(\div\), respectively, or $\frac{1}{2}$ for \(\frac{1}{2}\)
  • $\pm$ renders to \(\pm\)

For other formats …

install.packages("rmdformats")

or

install.packages("remotes")
remotes::install_github("juba/rmdformats")
  • Create from template:
    • File > New File > R Markdown (e.g. readthedown or robobook for documents)
    • rmdshower::shower_presentations and ioslides_presentation for slides
  • Rmarkdown example
  • Online sharing on RPubs (“Publish”)

APA7 manuscripts using papaja

Task (homework)

  • You will use RMarkdown for your assignment (get started now).
  • Create a list of continuous response variables that can be found in Psychology.
  • Find a suitable data set (should have a mix of continuous and categorical variables).
  • Use the RMarkdown we started and replace data with your own, calculate descriptives, and a visualisation.

Recommended reading

References

Blomkvist, Andreas W., Fredrik Eika, Martin T. Rahbek, Karin D. Eikhof, Mette D. Hansen, Malene Søndergaard, Jesper Ryg, Stig Andersen, and Martin G. Jørgensen. 2017. “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board: A Cross-Sectional Study of 354 Subjects from 20 to 99 Years of Age.” PLoS One 12 (12): e0189598.

Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk Van Waes. 2021. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing, 1–26.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Xie, Yihui. 2017. Dynamic Documents with R and Knitr. Chapman; Hall/CRC.