Previously

A brief reminder.

The universe

  • r, RStudio, GitHub are the basic tools for an integrated workflow.

  • Preferably, we work with r objects.

  • Preferably, we save code in r script files.

  • clone, push, pull, and commit the basic git verbs.

  • Functions have the structure {function name}(). But, writing custom functions need this structure function(){}

  • Packages simplify the workflow but may not be always maintained.

Work routine template

  1. Forth-and-back code writing in the console

  2. Save working code in an r script file

  3. Create Rproject to ensure path dependencies

  4. Create and/ or clone online repository on GitHub

  5. git commit and git push the r script file on GitHub online repository

  6. git pull commited updates from online repository

Good for

Repetitive tasks can be reduced through the use of code and a designated work environment.

Among the domains where this is an asset are:

  1. Research: Data analysis and results interpretation, writing manuscripts, and adhering to open science.
  2. Applied sector: Writing of repetitive reports.
  3. Education: Transparent homework.

Elements and structure

Rmarkdown is an enhanced document type that supports plain text, code and formatting, and from which output file formats can be rendered: PDF, DOCX, HTML are the most common formats.

  • .Rmd editable markdown document in R

  • knitr engine (jupyter engine for python)

  • md simplified markdown document in a markup language

  • pandoc document converter

Elements and structure

What we need to focus on is the .Rmd (or the new generation .qmd addressed in the next sessions).

flowchart LR
  A(Plain text) --> Z{.Rmd}
  B(Code) --> Z{.Rmd}
  C(Images) --> Z{.Rmd}
  D(External source file) --> Z{.Rmd}
  E(Formatting source file) --> Z{.Rmd}
  F(...) --> Z{.Rmd}

Elements that an .Rmd file can integrate

Elements and structure

When knitting to an HTML output format we don’t need anything else to install.

But, when knitting to a PDF output format we’d need a latex distribution!

latex distribution needed

Install the tinytex package now.

# to install tinytex distribution 
install.packages('tinytex')
tinytex::install_tinytex()
# to uninstall TinyTeX, run tinytex::uninstall_tinytex() 

Live/ enhanced documents

Documents that are coded to retrieve data and/ or information from external source material (e.g., datasets or meta-data such as from Excel sheets).

This is the building block for creating all sorts of automatized reports.

Path dependencies

Working with .Rprojtakes care of that.

Otherwise, paths to external source files need to be called adequately.

Path dependencies inside projects

To benefit from established path dependencies, work with subfolders inside your project repository.

When calling from inside a subfolder make sure to call the subfolder first followed by the file itself.

SUBFOLDER/FILE.FORMAT TYPE

When calling a file from the root repository folder (where the .Rproj extension is stored) simply call the file itself.

The set up

Let us set it all up.

.Rmd inside an .Rproj

Create project

Create an empty .Rmd
Figure 2: Create an empty RMarkdown file inside an Rproject

.Rmd elements

Defines the parameters of the entire rmarkdown document.

pandoc document convertor uses these parameters.

For example, the output file format, title, author, or date of the document version.

See at lines 1 – 5.

Simple text. No editing, no hyperlinks, no enhanced fields.

Special formating is possible. See next slide for a brief guide.

See at lines 13 – 15.

Enhanced field where code is integrated. Varying programming languages can be integrated, r, python, SQL, Julia and so on.

Useful when the goal is, for example, integration of output figures and tables.

See at lines 17 – 19.

Enhanced field where code is integrated seamlessly with plain text.

Keep inline code simple. Use code chunks to prepare output before integrating code with plain text.

`r code here`

See next slides.

Rmarkdown basics

*abc* italics abc

**abc** bold abc

`backstick`

code backstick

H~2~O subscript H2O

R^2^ superscript R2

[https://www.r-project.org/](https://www.r-project.org/)

https://www.r-project.org/

![Image from local repository](img/logo.jpg)

Image from local repository

Image from local repository

![Image from the Internet](https://www.r-project.org/Rlogo.png)

Image from the Internet

Image from the Internet

# First level.

## Second level.

### Third level.

Watch the indentation!

Unordered

* Item 1
* Item 2
    + Item 2a
    + Item 2b
  • Item 1
  • Item 2
    • Item 2a
    • Item 2b

Ordered

1. Item 1
2. Item 2
3. Item 3
    + Item 3a
    + Item 3b
  1. Item 1
  2. Item 2
  3. Item 3
    • Item 3a
    • Item 3b
  • Surround by $ for inline mathematics and $$ for displayed equations. Without empty space!

$x = y$ becomes \(x = y\)

$\left(\int_{a}^{b} f(x) \; dx\right)$ becomes \(\left(\int_{a}^{b} f(x) \; dx\right)\)

  • Include Greek letters too.

$\alpha A$ becomes \(\alpha A\)

  • Equations

$$\sum_{n=1}^{10} n^2$$ becomes \[\sum_{n=1}^{10} n^2\]

Some yaml parameters

Copy-paste from Stanciu et al. (2024)

title: "Can human values explain one’s interest in cryptocurrencies? An explorative study in Germany"
author: 
  - name: "Adrian Stanciu"
    affiliation_number: 1
  - name: "Melanie Partsch"
    affiliation_number: 1
  - name: "Clemens Lechner"
    affiliation_number: 1
affiliations:
  - "GESIS-Leibniz Institute for the Social Sciences, Mannheim"
  - "University of Bremen, Bremen"
shorttitle: "Values and cryptocurrencies"
authors_note: "For correspondence contact Dr. Adrian Stanciu, Data and Research on Society, GESIS-Leibniz Institute for the Social Sciences, PO Box 12215, 68072 Mannheim, Germany. Email: adrian.stanciu[at]gesis.org"
abstract: "Write abstract here"
keywords: "Values, Cryptocurrencies, Germany"
date: "`r format(Sys.time(), '%d. %B, %Y')`"
doctype: doc
header-includes:
  - \usepackage{subfig}
output: 
  bookdown::pdf_document2:
    toc: False
    number_sections: False
    template: "style/template.tex"
csl: style/apa.csl
bibliography: reference.bib

Some r code chunk attributes

echo=TRUE # whether the code is displayed in the output file
eval=TRUE # whether the code is ran and the outcome generated
include=TRUE # whether the code and its outcome is included in the output document
# sets attributes for entire document
knitr::opts_chunk$set(echo = TRUE,eval=FALSE,warning = FALSE,message = FALSE)

Code chunk attributes

It may be simpler to set code chunk attributes for the entire document at the beginning of the document.

In this first code chunk also install all the relevant packages.

Familiarize yourself

Inspect the newly created .Rmd document and identify the discussed elements.

Play with the attributes and/ or use online search engines to identify new attributes of code chunks and yaml parameters.

Packages

We install using pacman the packages tidyverse, readxl (for reading Excel sheets), haven (for reading SPSS files), sjlabelled (for dealing with labelled dataframes), kable and kableExtra (for creating tables).

install.packages("pacman")
pacman::p_load(tidyverse,readxl,haven,sjlabelled,kable, kableExtra)

Illustrative example

From here on, we build automaized reports, websites and books, and shiny apps using real data.

Data

We use the subsample data from Stanciu et al. (2017) and the movies.xlsx metadata.

Download from the R beyond data analysis book.

# create an object dataframe example `dfex` and assign to it the .sav file `sample.sav` that was introduced previously
dfex<-haven::read_sav("data/sample.sav")

# create an object movies metadata `dfmv` and assign to it the .xlsx file `movies.xlsx`
# note the different paths to these files
# note that we specify which sheet to read too; here only sheet 1 is imported
dfmv<-readxl::read_excel("mat/movies.xlsx",1)
# check if the source material was imported successfully 
# by observing the first lines in the tables
head(dfex)
# A tibble: 6 × 9
    ppn gen          age res       res_other men_warm men_comp wom_warm wom_comp
  <dbl> <dbl+lbl>  <dbl> <dbl+lbl> <chr>     <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb>
1   459 1 [Female]    24  5 [Iasi] -99        3 [Und…  4 [Agr… 3 [Unde… 4 [Agre…
2   592 2 [Male]      21  5 [Iasi] -99        3 [Und…  4 [Agr… 3 [Unde… 3 [Unde…
3   634 2 [Male]      21 NA        petrosani  4 [Agr…  5 [Str… 4 [Agre… 4 [Agre…
4   369 1 [Female]    30  8 [Gala… -99       NA       NA       4 [Agre… 4 [Agre…
5   121 1 [Female]    21  4 [Timi… -99        4 [Agr…  3 [Und… 3 [Unde… 4 [Agre…
6   127 1 [Female]    20  4 [Timi… -99        4 [Agr…  4 [Agr… 4 [Agre… 2 [Disa…
head(dfmv)
# A tibble: 4 × 6
  Movie                       Actor                 Like  Why     Grade Wikilink
  <chr>                       <chr>                 <chr> <chr>   <dbl> <chr>   
1 John Wick                   Keanu Reeves          Yes   Fight …    10 https:/…
2 Call me by your name        Timothee Chalamet     Yes   Beauti…    10 https:/…
3 Terminator                  Arnold Schwarzenegger Yes   Arnold      9 https:/…
4 4 months 3 weeks and 2 days <NA>                  Yes   Portra…     8 https:/…

Plain vs. enhanced text

The .rmd is already a step forward toward automatization in that it retrieves external source material.

Not too helpful because the display of those contents are static, or as plain information.

Inline coding can integrate enhanced text with plain text through the knit engine.

Power for repetitive reports or quick inspection of data collection progress.

Plain vs. enhanced text

What to look for:

  • what is repetitive

  • what can be integrated from external source material

  • what vector contains the desired information (character strings and numeric vectors behave differently)

Plain vs. enhanced text

This is an example of how automatization can be implemented in the work flow. 
My list of movies include `r nrow(dfmv)` entries. 
The title of those movies are `r dfmv$Movie`. 
Is there a movie that I actually dont like on that list, well, the answer is that I dislike exactly 
`r dfmv %>% filter(Like %in% c("No","no","NO")) %>% nrow()`
movies on that list.
nrow(dfmv) # My list of movies includes...entries
[1] 4
dfmv$Movie # Title of those movies are...
[1] "John Wick"                   "Call me by your name"       
[3] "Terminator"                  "4 months 3 weeks and 2 days"
dfmv %>% filter(Like %in% c("No","no","NO")) %>% nrow() # ...I dislike exactly...
[1] 0

This is an example of how automatization can be implemented in the work flow. My list of movies include 4 entries. The title of those movies are John Wick, Call me by your name, Terminator, 4 months 3 weeks and 2 days. Is there a movie that I actually don’t like on that list, well, the answer is that I dislike exactly 0 movies on that list.

DIY – Enhanced text

Modify the movies.xlsx or create your own metadata (.xlsx sheet) and then write an enhanced text in Rmarkdown.

Import .xlsx

Remember to import the .xlsx file using readxl::read_excel().

Watch out for the right path dependency.

Automated graphs and tables

Tables and graphs can be automatically updated with new data.

Graphs

dfex_n15<-haven::read_sav("data/tmpdf1.sav") %>% 
  sjlabelled::remove_all_labels() %>% 
  mutate(gen=factor(gen),
         res=factor(res))

ggplot(dfex_n15, aes(x=gen, y=wom_warm)) + 
  labs(x="Gender",
       y="Stereotype of warmth") +
  geom_boxplot() + 
  theme_light()

dfex_n60<-haven::read_sav("data/tmpdf2.sav") %>% 
  sjlabelled::remove_all_labels() %>% 
  mutate(gen=factor(gen),
         res=factor(res))

ggplot(dfex_n60, aes(x=gen, y=wom_warm)) + 
  labs(x="Gender",
       y="Stereotype of warmth") +
  geom_boxplot() + 
  theme_light()

dfex<-haven::read_sav("data/sample.sav") %>% 
  sjlabelled::remove_all_labels() %>% 
  mutate(gen=factor(gen),
         res=factor(res))

ggplot(dfex, aes(x=gen, y=wom_warm)) + 
  labs(x="Gender",
       y="Stereotype of warmth") +
  geom_boxplot() + 
  theme_light()

Tables

dfmv %>% knitr::kable(caption="Simple table using knitr::kable()",format = "pipe")
Simple table using knitr::kable()
Movie Actor Like Why Grade Wikilink
John Wick Keanu Reeves Yes Fight scenes 10 https://en.wikipedia.org/wiki/John_Wick_(film)
Call me by your name Timothee Chalamet Yes Beautiful love story 10 https://en.wikipedia.org/wiki/Call_Me_by_Your_Name_(film)
Terminator Arnold Schwarzenegger Yes Arnold 9 https://en.wikipedia.org/wiki/The_Terminator
4 months 3 weeks and 2 days NA Yes Portrayal of life in communist Romania 8 https://en.wikipedia.org/wiki/4_Months%2C_3_Weeks_and_2_Days
# does some data manipulation to retrieve the required information
tmptbl<-dfmv %>% 
  filter(Actor %in% c("Keanu Reeves", "Alec Baldwin"))

# creates an empty table holder that is our summary table that we'd
# want to include in the final output document
extbl<-tibble(
  
  like=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Like,
  name=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Actor,
  movie=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Movie,
  wiki=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Wikilink
  
)
  
extbl %>% knitr::kable(caption="Movies graded 8 or more from liked and least like actors", format="pipe")
Movies graded 8 or more from liked and least like actors
like name movie wiki
Yes Keanu Reeves John Wick https://en.wikipedia.org/wiki/John_Wick_(film)

DIY – Edit the .xlsx sheet

Open Microsoft Excel movies.xlsx and add one or more movies by actor Alec Baldwin while pretending you dislike the actor.

Or, you modify the table code and replace the two actors with actors you dislike and like and update the Excel sheet accordingly making sure you maintain the sheet structure.

Re-knit the tables.

Knit with parameters

Knitting with parameters simplifies even more the work routine.

It uses a friendly user interface, the shiny interface.

Parameters

Defined the parameters in the yaml head.

Paramters

Characteristics of the document that are repetitive both throughout the document and along the iteration of various versions of the document.

Progress illustrative example

Name of actors in the Excel sheet movies.xlsx.

Which of the stereotype evaluation from subsample Stanciu et al. (2017) we’d want to use for graph creation.

Also, which dataset we use.

Set up – yaml header

title: "example"
output: html_document
date: "2025-03-31"
params:
  actor:
    label: "Actor"
    value: "Keanu Reeves"
    input: select
    choices: ["Keanu Reeves", "Alec Baldwin","Arnold Schwarzenegger"]
    multiple: yes
  stereotype:
    label: "Stereotype evaluation"
    value: wom_warm
    input: select
    choices: [wom_warm,wom_comp,men_warm,men_comp]
    multiple: no
  sampledf:
    label: "Dataset version"
    value: sample.sav
    input: select
    choices: [sample.sav,tmpdf1.sav,tmpdf2.sav]
    multiple: no

Using parameters in code

We can either use directly or assign to an object.

actor<-params$actor
st<-params$stereotype

Calling parameters

Remember to always call parameters as such: params${label defined parameter}

Using parameters in code

# 1 - imports dataset into object tempdf
tempdf<-haven::read_sav("data/tmpdf1.sav") %>% 
  sjlabelled::remove_all_labels() %>% 
  pivot_longer(contains("warm") | contains("comp")) %>% 
  filter(name %in% st)

# 2 - applies the ggplot to the dataset
ggplot(tempdf, aes(x=factor(gen), y=value)) + 
  labs(title=paste0("Evaluation based on ",st), 
       x="Gender",
       y="Stereotype") +
  geom_boxplot() + 
  theme_light()

Using parameters in code

# does some data manipulation to retrieve the required information
tmptbl<-dfmv %>% 
  filter(Actor %in% actor)

# creates an empty table holder that is our summary table that we'd
# want to include in the final output document
extbl<-tibble(
  
  like=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Like,
  name=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Actor,
  movie=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Movie,
  wiki=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Wikilink
  
)

Where is the difference

Without parameter

tmptbl<-dfmv %>% 
  filter(Actor %in% c("Keanu Reeves", "Alec Baldwin"))

Parameter defined

tmptbl<-dfmv %>% 
  filter(Actor %in% actor)

Knit with parameters

Even more…

One other way to work with parameterized reports is to code the document such that it creates tables (or anything else for that matter) using a specific dataset.

sampledf<-paste0("data/",params$sampledf) # assigns parameter
abc<-haven::read_sav(sampledf) # uses parameter in code
abc %>% 
  sjlabelled::remove_all_labels() %>% 
  pivot_longer(contains("warm") | contains("comp")) %>% 
  group_by(name) %>% 
  summarise(mean=mean(value, na.rm = TRUE), # we use missing remove TRUE (na.rm=TRUE) to make sure r gives an output
            sd=sd(value, na.rm = TRUE),
            min=min(value, na.rm = TRUE),
            max=max(value, na.rm = TRUE))

DIY – parameterized reports

Download from the R beyond data analysis book the Examples .rmd and think of new parameters to add to the document.

https://adrian-stanciu.quarto.pub/r-beyond-data-analysis/

Reference list

Stanciu, A., Cohrs, C. J., Hanke, K., & Gavreliuc, A. (2017). Within-culture variation in the content of stereotypes: Application and development of the stereotype content model in an eastern european culture. The Journal of Social Psychology, 157(5), 611–628. https://doi.org/10.1080/00224545.2016.1262812
Stanciu, A., Partsch, M. V., & Lechner, C. M. (2024). Basic human values and the adoption of cryptocurrency. Frontiers in Psychology, 15. https://doi.org/10.3389/fpsyg.2024.1395674