Aims for today

  • Sorting out your RStudio working environment.
  • Getting you ready to write reproducible reports in RMarkdown.

How can you trust published results?

How can you trust published results?

  • Results are communicated through reports such as peer reviewed scientific articles.
  • Reported results should be reproducible!
  • Data and code should be made available to others.
  • It is difficult to relate code to figures, tables, numbers etc. especially where numbers were copy-pasted from software.
  • Also manual processing of data and generation of reports is error prone.

Why do we need reproducible analyses?

  • Open Science Collaboration led by Brian Nosek (Open Science Collaboration 2015).
  • Aimed to replicate important findings in psychological research.
  • Main finding: 36% where replicated; only 23% for Social Psychology
  • Reasons: small samples, pressure to publish sensational results.
  • Make Psychology Science Again: pre-registration, replications, large samples (power analysis), abandoning significance testing, transparency.

A truely reproducible report

  • Data, code, report need be one unit.
  • Based on the concept of literate programming (Knuth 1984): text and code are linked in one single file to generate manual or computer program.
  • This principle can be used to creating reproducible reports, sometimes known as dynamic documents (Xie 2017).
  • RMarkdown is the best way of doing this using R.

Why RMarkdown?

  • Documentation of all analysis steps
  • Quickly updating and reproducing analysis
  • No copy-pasting between programmes
  • Producing texts, website, supplementary materials, APA-style manuscripts, slides
  • Easy cross-referencing
  • Automatic reference lists
  • Easy type-setting of equations

Key concepts and tools

What is RMarkdown?

  • Text file format, a script like an R script.
  • Open, edit, run in RStudio like scripts.
  • RMarkdown are a mixture of markdown code and normal R code (and your data).
  • R code in RMarkdown documents occurs in R chunks, i.e. blocks of code, or inline R code inside of markdown code
  • Markdown: minimal syntax that instructs how text should be formatted.
  • Rendering into .pdf, .html, .docx, .doc

Setup an R-project

  • What it is: A special folder for your work in R.
  • What it does: Keeps your scripts, data, and results together.
  • Why it helps:
    • Sets the folder as your working directory automatically.
    • No setwd() headaches.
    • Easy to share or move your whole project.
  • Everything for one analysis stays neatly in one place.
  • Create R-Project for psyc30815-stats-3
  • Create sub-directories for “data”, “exercises”, “slides”
  • Download the blomkvist.csv data set form the NOW Learning Room
  • Add the data file to the folder “data”

Installation

  • RStudio comes with necessary R package rmarkdown
  • To create pdf outputs, we require the package tinytex
# Is tidyverse installed? If not run
install.packages('tidyverse')
install.packages('tinytex')
tinytex::install_tinytex()
# then restart RStudio
# Then, try
tinytex:::is_tinytex() # should return TRUE

From the RStudio menu …

  • Click on File > New File > R Markdown

  • Document (PDF, WORD, HTML)

  • output: pdf_document: use pdfs for writing manuscripts

  • Name your RMarkdown: fa-linear-regression.Rmd

  • Save in same directory as .Rproj file.

  • You’ll see chunks of R code and inline R code.

  • Remainder is plain Markdown.

  • Knit > Knit to PDF to compile .Rmd to .pdf. Wow!!!

  • Notice how the code was translated to in pdf.

  • Important: R code is executed from top down.

  • Other demos and examples are provided.

Load data

  • Create a new R chunk: try CTRL+ALT+I or CMD+ALT+I (i.e. the letter “i”)
  • Create a chunk called packages and load libraries needed:
library(tidyverse)
  • Create a new chunk called loaddata and load Blomkvist et al. (2017) data:
blomkvist <- read_csv("data/blomkvist.csv") %>% 
  select(id, age, smoker, sex, rt = rt_hand_d) %>% 
  drop_na()

The setup chunk: global options

  • Note ```{r setup, echo=FALSE}
  • setup is a label of this chunk (optional; useful for cross-referencing of figures and tables).
  • Chunk configuration option echo = FALSE: don’t display chunk in output; echo = TRUE: display chunk.
knitr::opts_chunk$set(message = FALSE, # don't return messages
                      warning = FALSE, # don't return warnings
                      comment = NA, # don't comment output
                      echo = TRUE, # display chunk (is default)
                      eval = TRUE, # evaluate chunk (is default)
                      out.width = '45%',  # figure width
                      fig.align='center') # figure alignment

Section headers

# This is a section header

## This is a subsection header

# This is another section header

## This is another subsection header

### This is a subsubsection header

Figures

  • Create a new R chunk with label myscatterplot
  • Set echo = F cause we only need the figure.
  • Add a figure caption fig.cap = "A scatterplot." in the chunk configurations.
library(psyntur)
scatterplot(x = age, y = rt, data = blomkvist)
  • Or using ggplot2
ggplot(blomkvist, aes(x = age, y = rt)) +
    geom_point() +
    theme_classic()

Figures

  • Change the default size to out.width = 50%.
  • Cross-reference figure using \ref{fig:myscatterplot} in the text.
"A scatterplot of age and reaction time can be found in Figure \ref{fig:myscatterplot}."

Cross-referencing figures requires the use of pdf_document2 as output format. In preamble replace output with:

output: 
  bookdown::pdf_document2:
      toc: false          # removes the table of contents
      number_sections: false   # removes section numbering

You may need to install bookdown. We also turning off the table of content (toc) and the section numbers.

Add header options

header-includes:
- \usepackage{booktabs}
  • We just need booktabs to improve type setting.

Formatted tables

  • Reporting results in tables formatted to a high standard.
  • Calculate descriptives:
library(psyntur)
(smoker_age <- describe(data = blomkvist, by = smoker, mean = mean(age), sd = sd(age)))
# A tibble: 3 × 3
  smoker  mean    sd
  <chr>  <dbl> <dbl>
1 former  65.2  16.9
2 no      53.0  21.3
3 yes     50.6  17.5
  • This format isn’t good enough for papers.

Formatted APA style tables

library(kableExtra)
smoker_age %>% 
  kable(format = 'latex',
        booktabs = TRUE,
        digits = 2,
        align = 'c', # centre value in each column
        caption = 'Descriptives of age by smoker.') %>% 
  kable_styling(position = 'center') # centre position of table
  • Also, label your chunk “smoker” and cross-reference the table in the text using Table \ref{tab:smoker}.

Bibliography and citations

  • Add to the YAML preamble:
bibliography: refs.bib
biblio-style: apalike
  • Create a file (in RStudio) called refs.bib (save in same working directory as your .Rmd file)

Bibliography and citations

  • Get the .bib entry for Blomkvist et al. (2017) from Google Scholar and paste it into refs.bib:
    • Copy the title “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board” into Google Scholar
    • Click cite and BibTeX
    • Copy the .bib entry into refs.bib
  • Note the citation key blomkvist2017reference
  • Cite Blomkvist et al. (2017) using @blomkvist2017reference or [@blomkvist2017reference].
  • At the end of your document create a section “# References

Formatted inline R output

# Fit the model and get the summary
model <- lm(rt ~ sex, data = blomkvist)
model_summary <- summary(model)
# Extract R^2
r2 <- model_summary$r.sq
The $R^2$ for this model is `r round(r2, 2)`.

Renders “The \(R^2\) for this model is 0.03.”

Formatted inline R output

# Extract F statistic
f_stat <- model_summary$fstatistic
p_value <- pf(f_stat[1], f_stat[2], f_stat[3], lower.tail = FALSE)
The model summary can be summarised like so: $F(`r round(f_stat[2])`, `r round(f_stat[3])`) =
`r round(f_stat[1],2)`$, $p `r format.pval(p_value, eps = 0.01)`$.

Renders “The model summary can be summarised like so: \(F(1, 263) = 7.48\), \(p <0.01\).”

p <- c(0.05, 0.02, 0.011, 0.005, 0.001)
format.pval(p, eps = 0.01)
[1] "0.05"  "0.02"  "0.01"  "<0.01" "<0.01"

Mathematical typesetting

  • Strings are parsed using \(\LaTeX\) and typeset accordingly when used between '$' symbols for inline mode.
  • For example $\beta$ renders \(\beta\).
  • Subscripts: $\beta_0$ is \(\beta_0\) and using '{}' for more than one symbol as in $\beta_{01}$ which is \(\beta_{01}\)
  • How would you write \(\beta_{0_1}\)?
  • Superscripts: '^' as in $\sigma^2$ which is \(\sigma^2\).
  • How would you write \(\sigma^{2^2}\)?
  • How about \(\sigma_e^2\)?

Some arithmetic operations and fractions

  • $x + y$, $x - y$
  • Multiplication use either $\cdot$ or $\times$ to get \(\cdot\) or \(\times\), respectively, as in \(3 \cdot 2\)
  • Division: $/$ or $\div$ to get \(/\) or \(\div\), respectively, or $\frac{1}{2}$ for \(\frac{1}{2}\)
  • $\pm$ renders to \(\pm\)

For other formats …

install.packages("rmdformats")

or

install.packages("remotes")
remotes::install_github("juba/rmdformats")
  • Create from template:
    • File > New File > R Markdown (e.g. readthedown or robobook for documents)
    • rmdshower::shower_presentations and ioslides_presentation for slides
  • Rmarkdown example
  • Online sharing on RPubs (“Publish”)

APA7 manuscripts using papaja

Task (homework)

  • You will use RMarkdown for your assignment (get started now).
  • Create a list of continuous response variables that can be found in Psychology.
  • Find a suitable data set (should have a mix of continuous and categorical variables).
  • Use the RMarkdown we started and replace data with your own, calculate descriptives, and a visualisation.

Recommended reading

References

Blomkvist, Andreas W., Fredrik Eika, Martin T. Rahbek, Karin D. Eikhof, Mette D. Hansen, Malene Søndergaard, Jesper Ryg, Stig Andersen, and Martin G. Jørgensen. 2017. “Reference Data on Reaction Time and Aging Using the Nintendo Wii Balance Board: A Cross-Sectional Study of 354 Subjects from 20 to 99 Years of Age.” PLoS One 12 (12): e0189598.

Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.

Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk VaWaes. 2024. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing 37 (2): 359–84. https://doi.org/10.1007/s11145-023-10489-4.

Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Xie, Yihui. 2017. Dynamic Documents with R and Knitr. Chapman; Hall/CRC.