Title, Author, Date, Output Format, Table of Contents

You can specify things like title, author and date in the header of your R Markdown file. This goes at the very beginning of the file, preceded and followed by lines containing three dashes.

Introduction to Using R Markdown for Class Assignments

Overview

R Markdown is a low-overhead way of writing reports which includes R code and the code’s automatically-generated output. It also lets you include nicely-typeset math, hyperlinks, images, and some basic formatting. The goal of this document is to explain, with examples, how to use its most essential features. It is not a comprehensive reference. (See rather http://rmarkdown.rstudio.com.)

This guide assumes that you know at least some R.

This guide was adapted from http://www.stat.cmu.edu/~cshalizi/rmarkdown and https://scidesign.github.io/Rmarkdownforclassreports.html

What is Markdown?

Markdown is a low-overhead mark-up language invented by John Gruber. There are now many programs for translating documents written in Markdown into documents in HTML, PDF or even Word format (among others). R Markdown is an extension of Markdown to incorporate running code, in R, and including its output in the document. This document look in turn at three aspects of R Markdown: how to include basic formatting; how to include R code and its output; and how to include mathematics.

Rendering and Editing

To write R Markdown you can use any text editor, a program which lets you read and write plain text files. You will also need R, and the package rmarkdown (and all the packages it depends on). I highly recommend using R Studio which comes with a built-in text editor, and has lots of tools for, working with R Markdown documents.

Rendering in R Studio

Assuming you have the document you’re working on open in the text editor, click the button that says “knit”.

Rendering in R without using R Studio

See the render command in the package rmarkdown.

Basic Formatting in R Markdown

For the most part, text is just text. One advantage of R Markdown is that the vast majority of your document will be stuff you just type as you ordinarily would.

Paragraph Breaks and Forced Line Breaks

To insert a break between paragraphs, include a single completely blank line.

To force a line break, put two blank spaces at the end of a line.

Headers

The character # at the beginning of a line means that the rest of the line is interpreted as a section header. The number of #s at the beginning of the line indicates whether it is treated as a section, sub-section, sub-sub-section, etc. of the document. For instance, Basic Formatting in R Markdown above is preceded by a single #, but Headers at the start of this paragraph was preceded by ###. Do not interrupt these headers by line-breaks.

Italics, Boldface

Text to be italicized goes inside a single set of underscores or asterisks. Text to be boldfaced goes inside a double set of underscores or asterisks.

Quotations

Set-off quoted paragraphs are indicated by an initial >:

In fact, all epistemological value of the theory of probability is based on this: that large-scale random phenomena in their collective action create strict, nonrandom regularity. [Gnedenko and Kolmogorov, Limit Distributions for Sums of Independent Random Variables, p. 1]

Computer type

Text to be printed in a fixed-width font, without further interpretation, goes in paired left-single-quotes, a.k.a. “back-ticks”, without line breaks in your typing. (Thus R vs. R.) If you want to display multiple lines like this, start them with three back ticks in a row on a line by themselves, and end them the same way:

Bullet Lists

  • This is a list marked where items are marked with bullet points.
  • Each item in the list should start with a * (asterisk) character, or a single dash (-).
  • Each item should also be on a new line.
    • Indent lines and begin them with + for sub-bullets.
    • Sub-sub-bullet aren’t really a thing in R Markdown.

Images

Images begin with an exclamation mark, then the text to use if the image can’t be displayed, then either the file address of the image (in the same directory as your document) or a URL. A local image

Online image

A remote image

There doesn’t seem to be a way of re-sizing images using these Markdown commands. Since you are using R Markdown, however, you can use the following hack:

Steps for installing tinytex (for pdf document)

Install the package install.packages("tinytex", repos = "https://cloud.r-project.org/") After installing TinyTex, close RStudio, reopen it and run this code tinytex::install_tinytex()

Including Code

The real point of R Markdown is that it lets you include your code, have the code run automatically when your document is rendered, and seamlessly include the results of that code in your document. The code comes in two varieties, code chunks and inline code.

Code Chunks and Their Results

A code chunk is simply an off-set piece of code by itself.

hist(iris$Sepal.Length)

knitr::kable(head(iris))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

Inline Code

Code output can also be seamlessly incorporated into the text, using inline code. This is code not set off on a line by itself, but beginning with r and ending with. Using inline code is how this document knows that the irisdata set contains 150 rows. Notice that inline code does not display the commands run, just their output.

Seen But Not Heard

Code chunks (but not inline code) can take a lot of options which modify how they are run, and how they appear in the document. These options go after the initial r and before the closing } that announces the start of a code chunk. One of the most common options turns off printing out the code, but leaves the results alone: ```{r, echo=FALSE}

Another runs the code, but includes neither the text of the code nor its output. ```{r, include=FALSE} This might seem pointless, but it can be useful for code chunks which do set-up like loading data files, or initial model estimates, etc.

Another option prints the code in the document, but does not run it: ```{r, eval=FALSE} This is useful if you want to talk about the (nicely formatted) code.

Tables

The default print-out of matrices, tables, etc. from R Markdown is frankly ugly. The knitr package contains a very basic command, kable, which will format an array or data frame more nicely for display.

Compare:

coefficients(summary(lm(Sepal.Length ~ Sepal.Width, data = iris)))
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  6.5262226  0.4788963 13.627631 6.469702e-28
## Sepal.Width -0.2233611  0.1550809 -1.440287 1.518983e-01

with

library(knitr) # Only need this the first time!
kable(coefficients(summary(lm(Sepal.Length ~ Sepal.Width, data = iris))))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5262226 0.4788963 13.627631 0.0000000
Sepal.Width -0.2233611 0.1550809 -1.440287 0.1518983

Setting Defaults for All Chunks

You can tell R to set some defaults to apply to all chunks where you don’t specifically over-ride them. Here are the ones I generally use:

# Need the knitr package to set chunk options
library(knitr)

# Set knitr options for knitting code into the report:
# - Don't print out code (echo)
# - Save results so that code blocks aren't re-run unless code changes (cache),
# _or_ a relevant earlier code block changed (autodep), but don't re-run if the
# only thing that changed was the comments (cache.comments)
# - Don't clutter R output with messages or warnings (message, warning)
  # This _will_ leave error messages showing up in the knitted report
opts_chunk$set(echo=FALSE,
               cache=TRUE, autodep=TRUE, cache.comments=FALSE,
               message=FALSE, warning=FALSE)

Math in R Markdown

Since this is a data analytics class, you need to be able to write out mathematical expressions, often long series of them. R Markdown gives you the syntax to render complex mathematical formulas and derivations, and have them displayed very nicely. Like code, the math can either be inline or set off (displays).

Inline math is marked off witha pair of dollar signs ($), as \(\pi r^2\) or \(e^{i\pi}\).

Mathematical displays are marked off with \[ and \], as in \[ e^{i \pi} = -1 \]

Once your text has entered math mode, R Markdown turns over the job of converting your text into math to a different program, called LaTeX. This is the most common system for typesetting mathematical documents throughout the sciences, and has been for decades. It is extremely powerful, stable, available on basically every computer, and completely free. It is also, in its full power, pretty complicated. Fortunately, the most useful bits, for our purposes, are actually rather straightforward.

Elements of Math Mode

  • Most letters will be rendered in italics (compare: a vs. a vs. \(a\); only the last is in math mode). The spacing between letters also follows the conventions for math, so don’t treat it as just another way of getting italics. (Compare speed, in simple italics, with \(speed\), in math mode.)
  • Greek letters can be accessed with the slash in front of their names, as \alpha for α. Making the first letter upper case gives the upper-case letter, as in \Gamma for Γ vs. \gamma for γ. (Upper-case alpha and beta are the same as Roman A and B, so no special commands for them.)
  • There are other “slashed” (or “escaped”) commands for other mathematical symbols: \times for \(\times\) \cdot for ⋅ \leq and \geq for ≤ and ≥ \subset and \subseteq for ⊂ and ⊆ \leftarrow, \rightarrow, \Leftarrow, \Rightarrow for ←, →, ⇐, ⇒ \approx, \sim, \equiv for ≈, ∼, ≡
  • See, e.g., http://web.ift.uib.no/Teori/KURS/WRK/TeX/symALL.html for a fuller listing of available symbols. (http://tug.ctan.org/info/symbols/comprehensive/symbols-a4.pdf lists all symbols available in LaTeX, including many non-mathematical special characters)
  • Subscripts go after an underscore character, _, and superscripts go after a caret, ^, as \beta_1 for \(\beta_1\) or a^2 for \(a^2\).
  • Curly braces are used to create groupings that should be kept together, e.g., a_{ij} for \(a_{ij}\) (vs. a_ij for \(a_ij\)).
  • If you need something set in ordinary (Roman) type within math mode, use \mathrm, as t_{\mathrm{in}}^2 for \(t_{\mathrm{in}}^2\).
  • If you’d like something set in an outline font (“blackboard bold”), use \mathbb, as \mathbb{R} for \(\mathbb{R}\).
  • For bold face, use , as

(\mathbf{x}^T\mathbf{x})^{-1}\mathbf{x}^T\mathbf{y}

\[(\mathbf{x}^T\mathbf{x})^{-1}\mathbf{x}^T\mathbf{y} \]

  • Accents on characters work rather like changes of font: \vec{a} produces \(\vec{a}\) , \hat{a} produces \(\hat{a}\). Some accents, particularly hats, work better if they space out, as with \widehat{\mathrm{Var}} producing \(\widehat{\mathrm{Var}}\).
  • Function names are typically written in romans, and spaced differently: thus \(\log{x}\), not \(logx\). LaTeX, and therefore R Markdown, knows about a lot of such functions, and their names all begin with \. For instance: \log, \sin, \cos, \exp, \min, etc. Follow these function names with the argument in curly braces; this helps LaTeX figure out what exactly the argument is, and keep it grouped together with the function name when it’s laying out the text. Thus \log{(x+1)} is better than \log (x+1).
  • Fractions can be created with \frac, like so:

\frac{a+b}{b} = 1 + \frac{a}{b} produces

\[ \frac{a+b}{b} = 1 + \frac{a}{b} \]

  • Sums can be written like so: \sum_{i=1}^{n}{x_i^2} will produce \[ \sum_{i=1}^{n}{x_i^2} \]

  • The lower and upper limits of summation after the \sum are both optional. Products and integrals work similarly, only with \prod and \int: \[ n! = \prod_{i=1}^{n}{i} \] \[ \log{b}-\log{a} = \int_{a}^{b}\frac{1}{x}dx \]

  • “Delimiters”, like parentheses or braces, can automatically re-size to match what they’re surrounding. To do this, you need to use \left and \right, as \left( \sum_{i=1}^{n}{i} \right)^2 = \left( \frac{n(n-1)}{2}\right)^2 = \frac{n^2(n-1)^2}{4} as \[ \left( \sum_{i=1}^{n}{i} \right)^2 = \left( \frac{n(n-1)}{2}\right)^2 = \frac{n^2(n-1)^2}{4} \]

  • To use curly braces as delimiters, precede them with slashes, as \{ and \} for { and }.

  • Multiple equations, with their equals signs lined up, can be created using eqnarray, as follows.

\[ \begin{eqnarray} X & \sim & \mathrm{N}(0,1)\\ Y & \sim & \chi^2_{n-p}\\ R & \equiv & X/Y \sim t_{n-p} \end{eqnarray} \]

\[ \begin{eqnarray} X & \sim & \mathrm{N}(0,1)\\ Y & \sim & \chi^2_{n-p}\\ R & \equiv & X/Y \sim t_{n-p} \end{eqnarray} \]

Notice that & surrounds what goes in the middle on each line, and each line (except the last) is terminated with \\. The left or right hand side of the equation can be blank, and space will be made: \[ \begin{eqnarray} P(|X-\mu| > k) & = & P(|X-\mu|^2 > k^2)\\ & \leq & \frac{\mathbb{E}\left[|X-\mu|^2\right]}{k^2}\\ & \leq & \frac{\mathrm{Var}[X]}{k^2} \end{eqnarray} \]

\[ \begin{eqnarray} P(|X-\mu| > k) & = & P(|X-\mu|^2 > k^2)\\ & \leq & \frac{\mathbb{E}\left[|X-\mu|^2\right]}{k^2}\\ & \leq & \frac{\mathrm{Var}[X]}{k^2} \end{eqnarray} \]

(In full LaTeX, \begin{eqnarray} automatically enters math mode, but R Markdown needs the hint.)

Some advanced math-mod stuff using \newCommand

newcommand{\MyParameter}{\theta}
\MyParameter \\

\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\Expect{a} \\

\newcommand{\Cov}[2]{\mathrm{Cov}\left[ #1, #2\right]}
\Cov{x}{y} \\

produces

\[ \newcommand{\MyParameter}{\theta} \MyParameter \\ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \Expect{a} \\ \newcommand{\Cov}[2]{\mathrm{Cov}\left[ #1, #2\right]} \Cov{x}{y} \]