1 Scripting Languages

A scripting language is any computer language which uses scripts, written programs that automate analytic tasks.

  • Often include general-purpose languages, e.g. Ruby, Python, C
  • Allows step-by-step documentation of analytical tasks
  • Provide preprocessing steps for data sharing and reproducible research
  • Without it, we must rely on pseudocode, or manually-written documentation

Basic Resources: Scripting Languages (Wikipedia)


Scripts document every step in your data analytic pipeline. Styling, annotation, and formatting are critical.

Scripts document every step in your data analytic pipeline. Styling, annotation, and formatting are critical.


2 Reproducible Research

Dr. Roger Peng of Johns Hopkins Bloomberg School of Public Health likens reproducible research to orchestras:

  • With some musical background, you can reproduce simple tunes by ear
  • Sophisticated works, e.g. the “Symphany of a Thousand”, requires hundreds of pages of sheet music
  • With sheet music, scores, and a composer, symphonies may be exactly recreated anywhere in the world

Like a complex symphony, data analyses may be reproduced precisely anywhere in the world:

  • This is critical, since data analysis does not have universal conventions
  • The “Gold Standard” of scientific research is replication
  • Not all experiments can be replicated, but most analyses may reproduced
  • The “Silver Standard” of scietific research is reproducibility
  • The greatest advantage is validation of analyses

Advanced Resources:


Reproducible research not only requires your script(s), but variable dictionaries and a README, as well.

Reproducible research not only requires your script(s), but variable dictionaries and a README, as well.


3 Literate Programming

Literate programming is a technique in which language and code are combined to narrate steps in your analysis.

  • Human-readable language is used for narrative, interpretation, and other nuances
  • Machine-readable code in code chunks are used to demonstrate how your code works
  • Code also formats outputs such as data tables and visualizations for publication

In effect, we can create elaborate analyses and narratives in a single deliverable, entirely in R.


Not the gray code chunks. This is what a scripted publication looks like under the hood.

Not the gray “code chunks”. This is what a scripted publication looks like under the hood.


Here, we can see what the publication looks like once the code and narrative are compiled.

Here, we can see what the publication looks like once the code and narrative are compiled.


4 Markdown

Before using R Markdown, we must first understand Markdown, a lightweight markup language.

  • Markdown is a very simple, easy-to-write markup language
  • Markup languages are designed to easily distinguish betweeen formatting and text
  • Relatively little effort needed for mastery
  • Used by websites like GitHub and Reddit


An open Markdown editor on GitHub.

An open Markdown editor on GitHub.


5 Rmarkdown & Knitr

Package rmarkdown is used to author R Markdown documents, while knitr compiles your code.

  • Packages knitr and rmarkdown are installed by default in RStudio
  • To open an R Markdown Script in RStudio:
    • Click “File”
    • Click “New File”
    • Click “R Markdown…”
    • Select desired medium (e.g. “Document”)
    • Include title, author, and date
    • Save with extension .rmd
  • To knit, or compile, your R Markdown document, simply select “Knit” in RStudio
    • You can knit to PDF, HTML, or Microsoft Word
    • Other media have different output options


Simply opening up a new R Markdown document provides a brief, instructive tutorial of possibilities.

Simply opening up a new R Markdown document provides a brief, instructive tutorial of possibilities.


5.1 Markdown Basics

Markdown syntax is easy to learn and RStudio guides are valuable:


Headers are created with # and may be used as a hierarchy:

  • # Title, e.g., is a main header
  • ## Title, e.g., is a subheader
  • ### Title, e.g., is a sub-subheader
  • It’s turtles all the way down.

For example:

# Executive Summary

Significant findings include...

# Background

The following provides...

## Motivations

The impetus behind the analysis...

## Caveats

The reader should be aware of...


Emphasis may be added to text with *.

  • Wrapping a word or sentence in * produces Italics
  • Wrapping a word or sentence in ** produces Bold

For example:

*This sentence will appear in Italics.*
  
**This sentence will appear in bold.**


Images may be added using the syntax and formula: [My caption.](Image URL or File Name).

  • Images saved locally and uploaded from your machine require the file name, e.g. example.jpg
  • Images used from existing websites simply require the URL
  • If uploading images, use getwd() and setwd() to select the directory with your images

For example:

[*This is a caption with Italics.*](my_image.jpg)


Hyperlinks may be used in-line (in the body of text) using the same formula as images.

For example:

## My Subheader

This is the body of my text, it does not contain code like code chunks, unless I want to insert a hyperlink to, e.g., (Wikipedia)[wikipedia.org].


Quotes simply require a > to precede the quoted text.

Block Quotes are possible by wrapping quotable text in three backticks, or “```”.

For example:

> If you torture the data long enough, nature will always confess. (Coase)


Lists can be made using a series of new lines and:

  • Asterisks, *, are used as bullet points
  • Numbers and periods, e.g. 1., are used for ordered lists
  • Two indents all for sub lists, marked by + or -
  • Note that lists must be separated by empty lines

For exmaple:

Here is what an unordered (bulleted) list looks like:
  
* Item 1
* Item 2

Here is what an ordered (numbered) list looks like:
  
1. Item 1
2. Item 2

And you can add a sublist like so:
  
1. Item 1
    - Subitem 1
    - Subitem 2


Highlighting Code In-Line allows us to emphasize specific words that are associated with code.

  • Simply wrap the code snippet in backticks, or “`”
  • These words stand apart from normal text due to their special formatting

For example:

To include code within text, I use single backticks, like `county_totals.csv`.


5.2 Code Chunks

Code Chunks are segments of your Markdown document that includes machine-readable code.

  • Begin a code chunk by opening with three backticks, curly brackets, and r inside
  • End a code chunk by closing with three backticks

While it’s not possible to show this in another R Markdown document, observe the following:


Note the three surrounding backticks on either side, indicating a code chunk.

Note the three surrounding backticks on either side, indicating a code chunk.


Modifying Code Chunks are logical orders in the opening {r} of a code chunk, and allows them to:

  • Evaluate or execute behind-the-scenes, if unimportant to audience
  • Repeat the input code and produce desired output, e.g. visualizations
  • Repeat the input code but hide output (to show process)
  • Suppress the input code but show output (to emphasize findings)
  • Etc.

The list of possible modifiers is extensive. Some frequently used include:

  • echo = TRUE repeats the input code for the audience; FALSE suppresses
  • include = FALSE executes the code without showing output, useful for progress bars
  • warning = FALSE suppresses warning messages from evaluated code
  • message = FALSE suppresses messages from evaluated code
  • eval = FALSE overrides the chunk and does not evaluate it, useful for demonstration

What would this look like? Note how there is no , separating r and the arguments:

{r echo = TRUE, warning = FALSE, message = FALSE}


Naming Code Chunks: It may be useful to name code chunks.

  • Simply include a custom name, without quotes, between r, a comma (,), and other arguments in {r}
  • Useful for keeping organized and debugging

For example:

{r my_code_chunk_1, echo = FALSE}


In-Line Code is the key to automating reports, because you can fill it with real time, dynamic values.

  • To create in-line code, simply include single backticks, r, and an object name

Observe the following code chunk and text to understand how this works:

index <- which(mtcars$hp == min(mtcars$hp))
small_car <- rownames(mtcars[index, ])
variable <- "horsepower"

“In 1972, reliance on horsepower as a key metric proved the Honda Civic to be the weakest car.”

index <- which(mtcars$disp == min(mtcars$disp))
small_car <- rownames(mtcars[index, ])
variable <- "displacement"

“In 1972, reliance on displacement as a key metric proved the Toyota Corolla to be the weakest car.”


Under the hood, we can see that these sentences changed dynamically by using in-line code.

Note how r small_car and r variable are used as placeholders for changing values.

Note how “r small_car” and “r variable” are used as placeholders for changing values.


5.3 Advanced Markdown

There are a few tricks in R Markdown to make for a better data product.


YAML Headers, or “YAML Ain’t Markup Language” (YAML) dictate the style and tone of your product.

  • For example, changing theme: to “lumen” or “camen” makes significant changes
  • This also controls your navigation pane, table of contents, and rendering

Learn more about YAML Headers in “Creating Pretty Documents from R Markdown” (Qiu, 2018).


Caching Code Chunks requires a simple argument in chunk headers, “cache = TRUE”.

  • This is particularly useful when a code chunk takes a long time to execute
  • You can iterate over R Markdown products more quickly if you knit them more quickly
  • Hence, cache the most expensive processes to save time and frustration

Warning: If you change a cached chunk, R may still knit the saved version, not the updated one.

  • In this case, simply go to “Knit” and “Clear Knitr Cache…”
  • Not knowing this is extremely frustrating


Inserting HTML is useful for fine-tuning your overall presentation. Commonly used are:

  • <br> creates a blank line or space
  • center> and </center> will center font, images, and R output
  • <style> and </style> for font alignment and other style elements

For example, the following will justify text alignment for your entire document, unless otherwise specified:

<style>
body {
text-align: justify}
</style>


5.4 RPubs

RPubs is a free platform provided by RStudio to publish R Markdown documents.

  • Integrates seamlessly with RStudio
  • Create an account and sign into RPubs
  • Once you knit a document, simply click “Publish” and “Publish Document”
  • Select “Rpubs” as your means of publication and it uploads automatically
  • Edit details, including title, description, and URL slug


6 Applied Practice

Instructions: Setting Up Rpubs. Visit Rpubs.com and create an account with:

  • A verifiable, professional email address
  • A unique, professional username, hopefully including some semblance of your name

Instructions: Create a New R Markdown Document. Use your new knowledge to open a new document.

  • Recall “File” | “New File” | “R Markdown…”
  • Select “Document” and “HTML”
  • Provide a name and author
  • Clear the tutorial contents but keep the YAML header
  • Save to your local directory using extension .rmd

Instructions: Create a Hidden Code Chunk. Be sure to:

  • Set the options of the code chunk to hide messages, warnings, and echo
  • Copy and paste the folloing for the code chunk contents
if(!require(ggplot2)){install.packages(ggplot2)}
## Loading required package: ggplot2
if(!require(GGally)){install.packages(GGally)}
## Loading required package: GGally
library(ggplot2)
library(GGally)

Instructions: Assign an Object in an Invisible Code Chunk. Be sure to:

  • Make sure the only chunk options are set to include = FALSE and cache = TRUE
  • Copy and paste the following code into a new chunk
mtcars$cyl <- as.factor(mtcars$cyl)
edv_plot <- ggpairs(mtcars, aes(fill = cyl))

Instructions: Print Object with Invisible Chunk. Be sure to:

  • Make sure the only chunk option is set to echo = FALSE
  • Copy and paste the following code into a new chunk
print(edv_plot)

Publish Your Document. Log into Rpubs. Click on “Knit”, then “Publish”, and choose “Rpubs”.


Good job. You’re published!