flowchart LR A(Plain text) --> Z{.Rmd} B(Code) --> Z{.Rmd} C(Images) --> Z{.Rmd} D(External source file) --> Z{.Rmd} E(Formatting source file) --> Z{.Rmd} F(...) --> Z{.Rmd}
You might know r
from data analysis. But, r
can do much more than that for you.
Faculty of Humanities, Education and Social Sciences (FHSE), University of Luxembourg
A brief reminder.
r
, RStudio
, GitHub
are the basic tools for an integrated workflow.
Preferably, we work with r
objects.
Preferably, we save code in r
script files.
clone
, push
, pull
, and commit
the basic git
verbs.
Functions have the structure {function name}()
. But, writing custom functions need this structure function(){}
Packages simplify the workflow but may not be always maintained.
Forth-and-back code writing in the console
Save working code in an r
script file
Create Rproject
to ensure path dependencies
Create and/ or clone online repository on GitHub
git commit
and git push
the r
script file on GitHub
online repository
git pull
commited updates from online repository
Towards an integrated and reproducible workflow.
Repetitive tasks can be reduced through the use of code and a designated work environment.
Among the domains where this is an asset are:
Rmarkdown
is an enhanced document type that supports plain text, code and formatting, and from which output file formats can be rendered: PDF, DOCX, HTML are the most common formats.
.Rmd
editable markdown document in R
knitr
engine (jupyter engine for python
)
md
simplified markdown document in a markup language
pandoc
document converter
What we need to focus on is the .Rmd
(or the new generation .qmd
addressed in the next sessions).
flowchart LR A(Plain text) --> Z{.Rmd} B(Code) --> Z{.Rmd} C(Images) --> Z{.Rmd} D(External source file) --> Z{.Rmd} E(Formatting source file) --> Z{.Rmd} F(...) --> Z{.Rmd}
When knitting
to an HTML output format we don’t need anything else to install.
But, when knitting
to a PDF output format we’d need a latex
distribution!
Documents that are coded to retrieve data and/ or information from external source material (e.g., datasets or meta-data such as from Excel sheets).
This is the building block for creating all sorts of automatized reports.
Working with .Rproj
takes care of that.
Otherwise, paths to external source files need to be called adequately.
Path dependencies inside projects
To benefit from established path dependencies, work with subfolders inside your project repository.
When calling from inside a subfolder make sure to call the subfolder first followed by the file itself.
When calling a file from the root repository folder (where the .Rproj
extension is stored) simply call the file itself.
Let us set it all up.
Defines the parameters of the entire rmarkdown
document.
pandoc
document convertor uses these parameters.
For example, the output
file format, title
, author
, or date
of the document version.
See at lines 1 – 5.
Simple text. No editing, no hyperlinks, no enhanced fields.
Special formating is possible. See next slide for a brief guide.
See at lines 13 – 15.
Enhanced field where code is integrated. Varying programming languages can be integrated, r
, python
, SQL
, Julia
and so on.
Useful when the goal is, for example, integration of output figures and tables.
See at lines 17 – 19.
*abc*
italics abc
**abc**
bold abc
code backstick
H~2~O
subscript H2O
R^2^
superscript R2
[https://www.r-project.org/](https://www.r-project.org/)


#
First level.
##
Second level.
###
Third level.
Watch the indentation!
$
for inline mathematics and $$
for displayed equations. Without empty space!$x = y$
becomes \(x = y\)
$\left(\int_{a}^{b} f(x) \; dx\right)$
becomes \(\left(\int_{a}^{b} f(x) \; dx\right)\)
$\alpha A$
becomes \(\alpha A\)
$$\sum_{n=1}^{10} n^2$$
becomes \[\sum_{n=1}^{10} n^2\]
Copy-paste from Stanciu et al. (2024)
title: "Can human values explain one’s interest in cryptocurrencies? An explorative study in Germany"
author:
- name: "Adrian Stanciu"
affiliation_number: 1
- name: "Melanie Partsch"
affiliation_number: 1
- name: "Clemens Lechner"
affiliation_number: 1
affiliations:
- "GESIS-Leibniz Institute for the Social Sciences, Mannheim"
- "University of Bremen, Bremen"
shorttitle: "Values and cryptocurrencies"
authors_note: "For correspondence contact Dr. Adrian Stanciu, Data and Research on Society, GESIS-Leibniz Institute for the Social Sciences, PO Box 12215, 68072 Mannheim, Germany. Email: adrian.stanciu[at]gesis.org"
abstract: "Write abstract here"
keywords: "Values, Cryptocurrencies, Germany"
date: "`r format(Sys.time(), '%d. %B, %Y')`"
doctype: doc
header-includes:
- \usepackage{subfig}
output:
bookdown::pdf_document2:
toc: False
number_sections: False
template: "style/template.tex"
csl: style/apa.csl
bibliography: reference.bib
Code chunk attributes
It may be simpler to set code chunk attributes for the entire document at the beginning of the document.
In this first code chunk also install all the relevant packages.
Inspect the newly created .Rmd
document and identify the discussed elements.
Play with the attributes and/ or use online search engines to identify new attributes of code chunks and yaml parameters.
We install using pacman
the packages tidyverse
, readxl
(for reading Excel sheets), haven
(for reading SPSS files), sjlabelled
(for dealing with labelled dataframes), kable
and kableExtra
(for creating tables).
From here on, we build automaized reports, websites and books, and shiny apps using real data.
We use the subsample data from Stanciu et al. (2017) and the movies.xlsx
metadata.
Download from the R beyond data analysis book.
# create an object dataframe example `dfex` and assign to it the .sav file `sample.sav` that was introduced previously
dfex<-haven::read_sav("data/sample.sav")
# create an object movies metadata `dfmv` and assign to it the .xlsx file `movies.xlsx`
# note the different paths to these files
# note that we specify which sheet to read too; here only sheet 1 is imported
dfmv<-readxl::read_excel("mat/movies.xlsx",1)
# check if the source material was imported successfully
# by observing the first lines in the tables
head(dfex)
# A tibble: 6 × 9
ppn gen age res res_other men_warm men_comp wom_warm wom_comp
<dbl> <dbl+lbl> <dbl> <dbl+lbl> <chr> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb>
1 459 1 [Female] 24 5 [Iasi] -99 3 [Und… 4 [Agr… 3 [Unde… 4 [Agre…
2 592 2 [Male] 21 5 [Iasi] -99 3 [Und… 4 [Agr… 3 [Unde… 3 [Unde…
3 634 2 [Male] 21 NA petrosani 4 [Agr… 5 [Str… 4 [Agre… 4 [Agre…
4 369 1 [Female] 30 8 [Gala… -99 NA NA 4 [Agre… 4 [Agre…
5 121 1 [Female] 21 4 [Timi… -99 4 [Agr… 3 [Und… 3 [Unde… 4 [Agre…
6 127 1 [Female] 20 4 [Timi… -99 4 [Agr… 4 [Agr… 4 [Agre… 2 [Disa…
# A tibble: 4 × 6
Movie Actor Like Why Grade Wikilink
<chr> <chr> <chr> <chr> <dbl> <chr>
1 John Wick Keanu Reeves Yes Fight … 10 https:/…
2 Call me by your name Timothee Chalamet Yes Beauti… 10 https:/…
3 Terminator Arnold Schwarzenegger Yes Arnold 9 https:/…
4 4 months 3 weeks and 2 days <NA> Yes Portra… 8 https:/…
The .rmd
is already a step forward toward automatization in that it retrieves external source material.
Not too helpful because the display of those contents are static, or as plain information.
Inline coding can integrate enhanced text with plain text through the knit
engine.
Power for repetitive reports or quick inspection of data collection progress.
What to look for:
what is repetitive
what can be integrated from external source material
what vector contains the desired information (character strings and numeric vectors behave differently)
This is an example of how automatization can be implemented in the work flow.
My list of movies include `r nrow(dfmv)` entries.
The title of those movies are `r dfmv$Movie`.
Is there a movie that I actually dont like on that list, well, the answer is that I dislike exactly
`r dfmv %>% filter(Like %in% c("No","no","NO")) %>% nrow()`
movies on that list.
[1] "John Wick" "Call me by your name"
[3] "Terminator" "4 months 3 weeks and 2 days"
This is an example of how automatization can be implemented in the work flow. My list of movies include 4 entries. The title of those movies are John Wick, Call me by your name, Terminator, 4 months 3 weeks and 2 days. Is there a movie that I actually don’t like on that list, well, the answer is that I dislike exactly 0 movies on that list.
Modify the movies.xlsx
or create your own metadata (.xlsx
sheet) and then write an enhanced text in Rmarkdown
.
Import .xlsx
Remember to import the .xlsx
file using readxl::read_excel()
.
Watch out for the right path dependency.
Tables and graphs can be automatically updated with new data.
Movie | Actor | Like | Why | Grade | Wikilink |
---|---|---|---|---|---|
John Wick | Keanu Reeves | Yes | Fight scenes | 10 | https://en.wikipedia.org/wiki/John_Wick_(film) |
Call me by your name | Timothee Chalamet | Yes | Beautiful love story | 10 | https://en.wikipedia.org/wiki/Call_Me_by_Your_Name_(film) |
Terminator | Arnold Schwarzenegger | Yes | Arnold | 9 | https://en.wikipedia.org/wiki/The_Terminator |
4 months 3 weeks and 2 days | NA | Yes | Portrayal of life in communist Romania | 8 | https://en.wikipedia.org/wiki/4_Months%2C_3_Weeks_and_2_Days |
# does some data manipulation to retrieve the required information
tmptbl<-dfmv %>%
filter(Actor %in% c("Keanu Reeves", "Alec Baldwin"))
# creates an empty table holder that is our summary table that we'd
# want to include in the final output document
extbl<-tibble(
like=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Like,
name=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Actor,
movie=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Movie,
wiki=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Wikilink
)
extbl %>% knitr::kable(caption="Movies graded 8 or more from liked and least like actors", format="pipe")
like | name | movie | wiki |
---|---|---|---|
Yes | Keanu Reeves | John Wick | https://en.wikipedia.org/wiki/John_Wick_(film) |
.xlsx
sheetOpen Microsoft Excel movies.xlsx
and add one or more movies by actor Alec Baldwin while pretending you dislike the actor.
Or, you modify the table code and replace the two actors with actors you dislike and like and update the Excel sheet accordingly making sure you maintain the sheet structure.
Re-knit the tables.
Knitting with parameters simplifies even more the work routine.
It uses a friendly user interface, the shiny
interface.
Defined the parameters in the yaml
head.
Paramters
Characteristics of the document that are repetitive both throughout the document and along the iteration of various versions of the document.
Name of actors in the Excel sheet movies.xlsx
.
Which of the stereotype evaluation from subsample Stanciu et al. (2017) we’d want to use for graph creation.
Also, which dataset we use.
title: "example"
output: html_document
date: "2025-03-31"
params:
actor:
label: "Actor"
value: "Keanu Reeves"
input: select
choices: ["Keanu Reeves", "Alec Baldwin","Arnold Schwarzenegger"]
multiple: yes
stereotype:
label: "Stereotype evaluation"
value: wom_warm
input: select
choices: [wom_warm,wom_comp,men_warm,men_comp]
multiple: no
sampledf:
label: "Dataset version"
value: sample.sav
input: select
choices: [sample.sav,tmpdf1.sav,tmpdf2.sav]
multiple: no
We can either use directly or assign to an object.
Calling parameters
Remember to always call parameters as such: params${label defined parameter}
# 1 - imports dataset into object tempdf
tempdf<-haven::read_sav("data/tmpdf1.sav") %>%
sjlabelled::remove_all_labels() %>%
pivot_longer(contains("warm") | contains("comp")) %>%
filter(name %in% st)
# 2 - applies the ggplot to the dataset
ggplot(tempdf, aes(x=factor(gen), y=value)) +
labs(title=paste0("Evaluation based on ",st),
x="Gender",
y="Stereotype") +
geom_boxplot() +
theme_light()
# does some data manipulation to retrieve the required information
tmptbl<-dfmv %>%
filter(Actor %in% actor)
# creates an empty table holder that is our summary table that we'd
# want to include in the final output document
extbl<-tibble(
like=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Like,
name=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Actor,
movie=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Movie,
wiki=tmptbl[ tmptbl$Grade >= 8 & tmptbl$Like %in% c("Yes","No"), ]$Wikilink
)
One other way to work with parameterized reports is to code the document such that it creates tables (or anything else for that matter) using a specific dataset.
abc %>%
sjlabelled::remove_all_labels() %>%
pivot_longer(contains("warm") | contains("comp")) %>%
group_by(name) %>%
summarise(mean=mean(value, na.rm = TRUE), # we use missing remove TRUE (na.rm=TRUE) to make sure r gives an output
sd=sd(value, na.rm = TRUE),
min=min(value, na.rm = TRUE),
max=max(value, na.rm = TRUE))
Download from the R beyond data analysis book the Examples .rmd
and think of new parameters to add to the document.
Rmarkdown
file to output file formats like PDF
Create project
Create an empty .Rmd
Figure 3: Elements of an .Rmd document
Image from local repository
Image from the Internet