Introduction to automatization and online application development using R

Automatization

Adrian Stanciu

Faculty of Humanities, Education and Social Sciences (FHSE), University of Luxembourg

Previously

A brief reminder.

The universe

r, RStudio, GitHub are the basic tools for an integrated workflow.
Preferably, we work with r objects.
Preferably, we save code in r script files.
clone, push, pull, and commit the basic git verbs.
Functions have the structure {function name}(). But, writing custom functions need this structure function(){}
Packages simplify the workflow but may not be always maintained.

Work routine template

Forth-and-back code writing in the console
Save working code in an r script file
Create Rproject to ensure path dependencies
Create and/ or clone online repository on GitHub
git commit and git push the r script file on GitHub online repository
git pull commited updates from online repository

Automatization

Towards an integrated and reproducible workflow.

Good for

Repetitive tasks can be reduced through the use of code and a designated work environment.

Among the domains where this is an asset are:

Research: Data analysis and results interpretation, writing manuscripts, and adhering to open science.
Applied sector: Writing of repetitive reports.
Education: Transparent homework.

Elements and structure

Rmarkdown is an enhanced document type that supports plain text, code and formatting, and from which output file formats can be rendered: PDF, DOCX, HTML are the most common formats.

.Rmd editable markdown document in R
knitr engine (jupyter engine for python)
md simplified markdown document in a markup language
pandoc document converter

Elements and structure

What we need to focus on is the .Rmd (or the new generation .qmd addressed in the next sessions).

flowchart LR
  A(Plain text) --> Z{.Rmd}
  B(Code) --> Z{.Rmd}
  C(Images) --> Z{.Rmd}
  D(External source file) --> Z{.Rmd}
  E(Formatting source file) --> Z{.Rmd}
  F(...) --> Z{.Rmd}

Elements that an .Rmd file can integrate

Elements and structure

When knitting to an HTML output format we don’t need anything else to install.

But, when knitting to a PDF output format we’d need a latex distribution!

latex distribution needed

Install the tinytex package now.

# to install tinytex distribution 
install.packages('tinytex')
tinytex::install_tinytex()
# to uninstall TinyTeX, run tinytex::uninstall_tinytex()

In this seminar we knit to an HTML and PDF output format.

Live/ enhanced documents

Documents that are coded to retrieve data and/ or information from external source material (e.g., datasets or meta-data such as from Excel sheets).

This is the building block for creating all sorts of automatized reports.

Path dependencies

Working with .Rprojtakes care of that.

Otherwise, paths to external source files need to be called adequately.

Path dependencies inside projects

To benefit from established path dependencies, work with subfolders inside your project repository.

When calling from inside a subfolder make sure to call the subfolder first followed by the file itself.

SUBFOLDER/FILE.FORMAT TYPE

When calling a file from the root repository folder (where the .Rproj extension is stored) simply call the file itself.

The set up

Let us set it all up.

.Rmd inside an .Rproj

.Rmd elements

yaml header

Defines the parameters of the entire rmarkdown document.

pandoc document convertor uses these parameters.

For example, the output file format, title, author, or date of the document version.

See at lines 1 – 5.

plain text

Simple text. No editing, no hyperlinks, no enhanced fields.

Special formating is possible. See next slide for a brief guide.

See at lines 13 – 15.

code chunk

Enhanced field where code is integrated. Varying programming languages can be integrated, r, python, SQL, Julia and so on.

Useful when the goal is, for example, integration of output figures and tables.

See at lines 17 – 19.

inline code

Enhanced field where code is integrated seamlessly with plain text.

Keep inline code simple. Use code chunks to prepare output before integrating code with plain text.

`r code here`

Rmarkdown basics

See this guide https://rmarkdown.rstudio.com/authoring_basics.html

*abc* italics abc

**abc** bold abc

`backstick`

code backstick

H~2~O subscript H₂O

R^2^ superscript R²

Add Links

[https://www.r-project.org/](https://www.r-project.org/)

https://www.r-project.org/

![Image from local repository](img/logo.jpg)

![Image from the Internet](https://www.r-project.org/Rlogo.png)

Use headings

# First level.

## Second level.

### Third level.

Work with lists

Watch the indentation!

Unordered

* Item 1
* Item 2
    + Item 2a
    + Item 2b

Item 1
Item 2
- Item 2a
- Item 2b

Ordered

1. Item 1
2. Item 2
3. Item 3
    + Item 3a
    + Item 3b

Item 1
Item 2
Item 3
- Item 3a
- Item 3b

Add math symbols

See this guide https://rpruim.github.io/s341/S19/from-class/MathinRmd.html

Surround by $ for inline mathematics and $$ for displayed equations. Without empty space!

$x = y$ becomes $x = y$

$\left(\int_{a}^{b} f(x) \; dx\right)$ becomes $\left(\int_{a}^{b} f(x) \; dx\right)$

Include Greek letters too.

$\alpha A$ becomes $\alpha A$

Equations

$$\sum_{n=1}^{10} n^2$$ becomes \[\sum_{n=1}^{10} n^2\]

Some yaml parameters

Copy-paste from Stanciu et al. (2024)

title: "Can human values explain one’s interest in cryptocurrencies? An explorative study in Germany"
author: 
  - name: "Adrian Stanciu"
    affiliation_number: 1
  - name: "Melanie Partsch"
    affiliation_number: 1
  - name: "Clemens Lechner"
    affiliation_number: 1
affiliations:
  - "GESIS-Leibniz Institute for the Social Sciences, Mannheim"
  - "University of Bremen, Bremen"
shorttitle: "Values and cryptocurrencies"
authors_note: "For correspondence contact Dr. Adrian Stanciu, Data and Research on Society, GESIS-Leibniz Institute for the Social Sciences, PO Box 12215, 68072 Mannheim, Germany. Email: adrian.stanciu[at]gesis.org"
abstract: "Write abstract here"
keywords: "Values, Cryptocurrencies, Germany"
date: "`r format(Sys.time(), '%d. %B, %Y')`"
doctype: doc
header-includes:
  - \usepackage{subfig}
output: 
  bookdown::pdf_document2:
    toc: False
    number_sections: False
    template: "style/template.tex"
csl: style/apa.csl
bibliography: reference.bib

Some r code chunk attributes

echo=TRUE # whether the code is displayed in the output file

eval=TRUE # whether the code is ran and the outcome generated

include=TRUE # whether the code and its outcome is included in the output document

# sets attributes for entire document
knitr::opts_chunk$set(echo = TRUE,eval=FALSE,warning = FALSE,message = FALSE)

Code chunk attributes

It may be simpler to set code chunk attributes for the entire document at the beginning of the document.

In this first code chunk also install all the relevant packages.

Familiarize yourself

Inspect the newly created .Rmd document and identify the discussed elements.

Play with the attributes and/ or use online search engines to identify new attributes of code chunks and yaml parameters.

Packages

We install using pacman the packages tidyverse, readxl (for reading Excel sheets), haven (for reading SPSS files), sjlabelled (for dealing with labelled dataframes), kable and kableExtra (for creating tables).

install.packages("pacman")
pacman::p_load(tidyverse,readxl,haven,sjlabelled,kable, kableExtra)

Illustrative example

From here on, we build automatized reports, websites and books, and shiny apps using real data.

Data

We use the subsample data from Stanciu et al. (2017) and the movies.xlsx metadata.

Download from the R beyond data analysis book. https://adrian-stanciu.quarto.pub/r-beyond-data-analysis/

Import

Once the data is downloaded, make sure it is stored in the project folder. Then import into the r environment.

# create an object dataframe example `dfex` and assign to it the .sav file `sample.sav` that was introduced previously
dfex<-haven::read_sav("data/sample.sav")

# create an object movies metadata `dfmv` and assign to it the .xlsx file `movies.xlsx`
# note the different paths to these files
# note that we specify which sheet to read too; here only sheet 1 is imported
dfmv<-readxl::read_excel("mat/movies.xlsx",1)

Inspect sample.sav

Once available for use in the r environment, we can perform actions on the data.

# check if the source material was imported successfully 
# by observing the first lines in the tables
head(dfex)

# A tibble: 6 × 9
    ppn gen          age res       res_other men_warm men_comp wom_warm wom_comp
  <dbl> <dbl+lbl>  <dbl> <dbl+lbl> <chr>     <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb>
1   459 1 [Female]    24  5 [Iasi] -99        3 [Und…  4 [Agr… 3 [Unde… 4 [Agre…
2   592 2 [Male]      21  5 [Iasi] -99        3 [Und…  4 [Agr… 3 [Unde… 3 [Unde…
3   634 2 [Male]      21 NA        petrosani  4 [Agr…  5 [Str… 4 [Agre… 4 [Agre…
4   369 1 [Female]    30  8 [Gala… -99       NA       NA       4 [Agre… 4 [Agre…
5   121 1 [Female]    21  4 [Timi… -99        4 [Agr…  3 [Und… 3 [Unde… 4 [Agre…
6   127 1 [Female]    20  4 [Timi… -99        4 [Agr…  4 [Agr… 4 [Agre… 2 [Disa…

Inspect movies.xlsx

head(dfmv)

# A tibble: 4 × 6
  Movie                       Actor                 Like  Why     Grade Wikilink
  <chr>                       <chr>                 <chr> <chr>   <dbl> <chr>   
1 John Wick                   Keanu Reeves          Yes   Fight …    10 https:/…
2 Call me by your name        Timothee Chalamet     Yes   Beauti…    10 https:/…
3 Terminator                  Arnold Schwarzenegger Yes   Arnold      9 https:/…
4 4 months 3 weeks and 2 days <NA>                  Yes   Portra…     8 https:/…

Plain vs. enhanced text

The .rmd is already a step forward toward automatization in that it retrieves external source material.

Not too helpful because the display of those contents are static, or as plain information.

Inline coding can integrate enhanced text with plain text through the knit engine.

Power for repetitive reports or quick inspection of data collection progress.

Plain vs. enhanced text

What to look for:

what is repetitive
what can be integrated from external source material
what vector contains the desired information (character strings and numeric vectors behave differently)

Enhancing plain text

This is an example of how automatization can be implemented in the work flow. 
My list of movies include `r nrow(dfmv)` entries. 
The title of those movies are `r dfmv$Movie`. 
Is there a movie that I actually dont like on that list, well, the answer is that I dislike exactly 
`r dfmv %>% filter(Like %in% c("No","no","NO")) %>% nrow()`
movies on that list.

…happening in the background

nrow(dfmv) # My list of movies includes...entries

[1] 4

dfmv$Movie # Title of those movies are...

[1] "John Wick"                   "Call me by your name"       
[3] "Terminator"                  "4 months 3 weeks and 2 days"

dfmv %>% filter(Like %in% c("No","no","NO")) %>% nrow() # ...I dislike exactly...

[1] 0

The output

This is an example of how automatization can be implemented in the work flow. My list of movies include 4 entries. The title of those movies are John Wick, Call me by your name, Terminator, 4 months 3 weeks and 2 days. Is there a movie that I actually don’t like on that list, well, the answer is that I dislike exactly 0 movies on that list.

DIY – Enhanced text

Modify the movies.xlsx or create your own metadata (.xlsx sheet) and then write an enhanced text in Rmarkdown.

Import .xlsx

Remember to import the .xlsx file using readxl::read_excel().

Watch out for the right path dependency.

Automated graphs and tables

Tables and graphs can be automatically updated with new data.

Graphs

A series of three graphs follows.

This series reproduces a scenario whereby a dataset is progressively updated during fieldwork.

Each week there are new observations collected, and for each week we’d need to prepare a field report.

n = 15

dfex_n15<-haven::read_sav("data/tmpdf1.sav") %>% 
  sjlabelled::remove_all_labels() %>% 
  mutate(gen=factor(gen),
         res=factor(res))

ggplot(dfex_n15, aes(x=gen, y=wom_warm)) + 
  labs(x="Gender",
       y="Stereotype of warmth") +
  geom_boxplot() + 
  theme_light()

n = 60

dfex_n60<-haven::read_sav("data/tmpdf2.sav") %>% 
  sjlabelled::remove_all_labels() %>% 
  mutate(gen=factor(gen),
         res=factor(res))

ggplot(dfex_n60, aes(x=gen, y=wom_warm)) + 
  labs(x="Gender",
       y="Stereotype of warmth") +
  geom_boxplot() + 
  theme_light()

n = 100

dfex<-haven::read_sav("data/sample.sav") %>% 
  sjlabelled::remove_all_labels() %>% 
  mutate(gen=factor(gen),
         res=factor(res))

ggplot(dfex, aes(x=gen, y=wom_warm)) + 
  labs(x="Gender",
       y="Stereotype of warmth") +
  geom_boxplot() + 
  theme_light()

Tables

Knit .xlsx sheet directly. No modifications made to the original Excel sheet.

dfmv %>% knitr::kable(caption="Simple table using knitr::kable()",format = "pipe")

Simple table using knitr::kable()
Movie	Actor	Like	Why	Grade	Wikilink
John Wick	Keanu Reeves	Yes	Fight scenes	10	https://en.wikipedia.org/wiki/John_Wick_(film)
Call me by your name	Timothee Chalamet	Yes	Beautiful love story	10	https://en.wikipedia.org/wiki/Call_Me_by_Your_Name_(film)
Terminator	Arnold Schwarzenegger	Yes	Arnold	9	https://en.wikipedia.org/wiki/The_Terminator
4 months 3 weeks and 2 days	NA	Yes	Portrayal of life in communist Romania	8	https://en.wikipedia.org/wiki/4_Months%2C_3_Weeks_and_2_Days

Tables

Adjustments and modifications made before the final table is reported.

# does some data manipulation to retrieve the required information
tmptbl<-dfmv %>% 
  filter(Actor %in% c("Keanu Reeves", "Alec Baldwin"))

# creates an empty table holder that is our summary table that we'd
# want to include in the final output document
extbl<-tibble(
  
  like=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Like,
  name=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Actor,
  movie=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Movie,
  wiki=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Wikilink
  
)
  
extbl %>% knitr::kable(caption="Movies graded 8 or more from liked and least like actors", format="pipe")

Movies graded 8 or more from liked and least like actors
like	name	movie	wiki
Yes	Keanu Reeves	John Wick	https://en.wikipedia.org/wiki/John_Wick_(film)

DIY – Edit the `.xlsx` sheet

Open Microsoft Excel movies.xlsx and add one or more movies by actor Alec Baldwin while pretending you dislike the actor.

Or, you modify the table code and replace the two actors with actors you dislike and like and update the Excel sheet accordingly making sure you maintain the sheet structure.

Re-knit the tables.

Knit with parameters

Knitting with parameters simplifies even more the work routine.

It uses a friendly user interface, the shiny interface.

Parameters

Defined the parameters in the yaml head.

Paramters

Characteristics of the document that are repetitive both throughout the document and along the iteration of various versions of the document.

Progress illustrative example

Name of actors in the Excel sheet movies.xlsx.

Which of the stereotype evaluation from subsample Stanciu et al. (2017) we’d want to use for graph creation.

Also, which dataset we use.

Set up – yaml header

title: "example"
output: html_document
date: "2025-03-31"
params:
  actor:
    label: "Actor"
    value: "Keanu Reeves"
    input: select
    choices: ["Keanu Reeves", "Alec Baldwin","Arnold Schwarzenegger"]
    multiple: yes
  stereotype:
    label: "Stereotype evaluation"
    value: wom_warm
    input: select
    choices: [wom_warm,wom_comp,men_warm,men_comp]
    multiple: no
  sampledf:
    label: "Dataset version"
    value: sample.sav
    input: select
    choices: [sample.sav,tmpdf1.sav,tmpdf2.sav]
    multiple: no

Using parameters in code

We can either use directly or assign to an object.

actor<-params$actor
st<-params$stereotype

Calling parameters

Remember to always call parameters as such: params${label defined parameter}

Using parameters in code

# 1 - imports dataset into object tempdf
tempdf<-haven::read_sav("data/tmpdf1.sav") %>% 
  sjlabelled::remove_all_labels() %>% 
  pivot_longer(contains("warm") | contains("comp")) %>% 
  filter(name %in% st)

# 2 - applies the ggplot to the dataset
ggplot(tempdf, aes(x=factor(gen), y=value)) + 
  labs(title=paste0("Evaluation based on ",st), 
       x="Gender",
       y="Stereotype") +
  geom_boxplot() + 
  theme_light()

Using parameters in code

# does some data manipulation to retrieve the required information
tmptbl<-dfmv %>% 
  filter(Actor %in% actor)

# creates an empty table holder that is our summary table that we'd
# want to include in the final output document
extbl<-tibble(
  
  like=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Like,
  name=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Actor,
  movie=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Movie,
  wiki=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Wikilink
  
)

Where is the difference

Without parameter

tmptbl<-dfmv %>% 
  filter(Actor %in% c("Keanu Reeves", "Alec Baldwin"))

Parameter defined

tmptbl<-dfmv %>% 
  filter(Actor %in% actor)

Knit with parameters

Even more…

One other way to work with parameterized reports is to code the document such that it creates tables (or anything else for that matter) using a specific dataset.

sampledf<-paste0("data/",params$sampledf) # assigns parameter
abc<-haven::read_sav(sampledf) # uses parameter in code

abc %>% 
  sjlabelled::remove_all_labels() %>% 
  pivot_longer(contains("warm") | contains("comp")) %>% 
  group_by(name) %>% 
  summarise(mean=mean(value, na.rm = TRUE), # we use missing remove TRUE (na.rm=TRUE) to make sure r gives an output
            sd=sd(value, na.rm = TRUE),
            min=min(value, na.rm = TRUE),
            max=max(value, na.rm = TRUE))

DIY – parameterized reports

Download from the R beyond data analysis book the Examples .rmd and think of new parameters to add to the document.

https://adrian-stanciu.quarto.pub/r-beyond-data-analysis/

Reference list

Stanciu, A., Cohrs, C. J., Hanke, K., & Gavreliuc, A. (2017). Within-culture variation in the content of stereotypes: Application and development of the stereotype content model in an eastern european culture. The Journal of Social Psychology, 157(5), 611–628. https://doi.org/10.1080/00224545.2016.1262812

Stanciu, A., Partsch, M. V., & Lechner, C. M. (2024). Basic human values and the adoption of cryptocurrency. Frontiers in Psychology, 15. https://doi.org/10.3389/fpsyg.2024.1395674

Introduction to automatization and online application development using R

Previously

The universe

Work routine template

Automatization

Good for

Elements and structure

Elements and structure

Elements and structure

Live/ enhanced documents

Path dependencies

The set up

.Rmd inside an .Rproj

.Rmd elements

yaml header

plain text

code chunk

inline code

Rmarkdown basics

Add Links

Use headings

Work with lists

Add math symbols

Some yaml parameters

Some r code chunk attributes

Familiarize yourself

Packages

Illustrative example

Data

Import

Inspect sample.sav

Inspect movies.xlsx

Plain vs. enhanced text

Plain vs. enhanced text

Enhancing plain text

…happening in the background

The output

DIY – Enhanced text

Automated graphs and tables

Graphs

n = 15

n = 60

n = 100

Tables

Tables

DIY – Edit the .xlsx sheet

Knit with parameters

Parameters

Progress illustrative example

Set up – yaml header

Using parameters in code

Using parameters in code

Using parameters in code

Where is the difference

Knit with parameters

Even more…

DIY – parameterized reports

Reference list

DIY – Edit the `.xlsx` sheet