A scripting language is any computer language which uses scripts, written programs that automate analytic tasks.
Basic Resources: Scripting Languages (Wikipedia)
Scripts document every step in your data analytic pipeline. Styling, annotation, and formatting are critical.
Dr. Roger Peng of Johns Hopkins Bloomberg School of Public Health likens reproducible research to orchestras:
Like a complex symphony, data analyses may be reproduced precisely anywhere in the world:
Advanced Resources:
Reproducible research not only requires your script(s), but variable dictionaries and a README, as well.
Literate programming is a technique in which language and code are combined to narrate steps in your analysis.
In effect, we can create elaborate analyses and narratives in a single deliverable, entirely in R.
Not the gray “code chunks”. This is what a scripted publication looks like under the hood.
Here, we can see what the publication looks like once the code and narrative are compiled.
Before using R Markdown, we must first understand Markdown, a lightweight markup language.
An open Markdown editor on GitHub.
Package rmarkdown is used to author R Markdown documents, while knitr compiles your code.
knitr and rmarkdown are installed by default in RStudio.rmdSimply opening up a new R Markdown document provides a brief, instructive tutorial of possibilities.
Markdown syntax is easy to learn and RStudio guides are valuable:
Headers are created with # and may be used as a hierarchy:
# Title, e.g., is a main header## Title, e.g., is a subheader### Title, e.g., is a sub-subheaderFor example:
# Executive Summary
Significant findings include...
# Background
The following provides...
## Motivations
The impetus behind the analysis...
## Caveats
The reader should be aware of...
Emphasis may be added to text with *.
* produces Italics** produces BoldFor example:
*This sentence will appear in Italics.*
**This sentence will appear in bold.**
Images may be added using the syntax and formula: [My caption.](Image URL or File Name).
example.jpggetwd() and setwd() to select the directory with your imagesFor example:
[*This is a caption with Italics.*](my_image.jpg)
Hyperlinks may be used in-line (in the body of text) using the same formula as images.
For example:
## My Subheader
This is the body of my text, it does not contain code like code chunks, unless I want to insert a hyperlink to, e.g., (Wikipedia)[wikipedia.org].
Quotes simply require a > to precede the quoted text.
Block Quotes are possible by wrapping quotable text in three backticks, or “```”.
For example:
> If you torture the data long enough, nature will always confess. (Coase)
Lists can be made using a series of new lines and:
*, are used as bullet points1., are used for ordered lists+ or -For exmaple:
Here is what an unordered (bulleted) list looks like:
* Item 1
* Item 2
Here is what an ordered (numbered) list looks like:
1. Item 1
2. Item 2
And you can add a sublist like so:
1. Item 1
- Subitem 1
- Subitem 2
Highlighting Code In-Line allows us to emphasize specific words that are associated with code.
special formattingFor example:
To include code within text, I use single backticks, like `county_totals.csv`.
Code Chunks are segments of your Markdown document that includes machine-readable code.
r insideWhile it’s not possible to show this in another R Markdown document, observe the following:
Note the three surrounding backticks on either side, indicating a code chunk.
Modifying Code Chunks are logical orders in the opening {r} of a code chunk, and allows them to:
The list of possible modifiers is extensive. Some frequently used include:
echo = TRUE repeats the input code for the audience; FALSE suppressesinclude = FALSE executes the code without showing output, useful for progress barswarning = FALSE suppresses warning messages from evaluated codemessage = FALSE suppresses messages from evaluated codeeval = FALSE overrides the chunk and does not evaluate it, useful for demonstrationWhat would this look like? Note how there is no , separating r and the arguments:
{r echo = TRUE, warning = FALSE, message = FALSE}
Naming Code Chunks: It may be useful to name code chunks.
r, a comma (,), and other arguments in {r}For example:
{r my_code_chunk_1, echo = FALSE}
In-Line Code is the key to automating reports, because you can fill it with real time, dynamic values.
r, and an object nameObserve the following code chunk and text to understand how this works:
index <- which(mtcars$hp == min(mtcars$hp))
small_car <- rownames(mtcars[index, ])
variable <- "horsepower"
“In 1972, reliance on horsepower as a key metric proved the Honda Civic to be the weakest car.”
index <- which(mtcars$disp == min(mtcars$disp))
small_car <- rownames(mtcars[index, ])
variable <- "displacement"
“In 1972, reliance on displacement as a key metric proved the Toyota Corolla to be the weakest car.”
Under the hood, we can see that these sentences changed dynamically by using in-line code.
Note how “r small_car” and “r variable” are used as placeholders for changing values.
There are a few tricks in R Markdown to make for a better data product.
YAML Headers, or “YAML Ain’t Markup Language” (YAML) dictate the style and tone of your product.
theme: to “lumen” or “camen” makes significant changesLearn more about YAML Headers in “Creating Pretty Documents from R Markdown” (Qiu, 2018).
Caching Code Chunks requires a simple argument in chunk headers, “cache = TRUE”.
Warning: If you change a cached chunk, R may still knit the saved version, not the updated one.
Inserting HTML is useful for fine-tuning your overall presentation. Commonly used are:
<br> creates a blank line or spacecenter> and </center> will center font, images, and R output<style> and </style> for font alignment and other style elementsFor example, the following will justify text alignment for your entire document, unless otherwise specified:
<style>
body {
text-align: justify}
</style>
RPubs is a free platform provided by RStudio to publish R Markdown documents.
Instructions: Setting Up Rpubs. Visit Rpubs.com and create an account with:
Instructions: Create a New R Markdown Document. Use your new knowledge to open a new document.
.rmdInstructions: Create a Hidden Code Chunk. Be sure to:
if(!require(ggplot2)){install.packages(ggplot2)}
## Loading required package: ggplot2
if(!require(GGally)){install.packages(GGally)}
## Loading required package: GGally
library(ggplot2)
library(GGally)
Instructions: Assign an Object in an Invisible Code Chunk. Be sure to:
include = FALSE and cache = TRUEmtcars$cyl <- as.factor(mtcars$cyl)
edv_plot <- ggpairs(mtcars, aes(fill = cyl))
Instructions: Print Object with Invisible Chunk. Be sure to:
echo = FALSEprint(edv_plot)
Publish Your Document. Log into Rpubs. Click on “Knit”, then “Publish”, and choose “Rpubs”.
Good job. You’re published!