R MarkdownR Markdown of R for Data Science (Grolemun & Wickham, 2017). This document is prepared for CP6521 Advanced GIS, a graduate-level city planning elective course at Georgia Tech in Spring 2019. For any question, contact the instructor, Yongsung Lee, Ph.D. via yongsung.lee(at)gatech.edu..R) for our work on our local machines.#comments to R script files, html files are visually better (than R script files), and we can use several tricks regarding what to show and what to hide on html files. Also, html files are better to non-technical users (e.g., planning commissioners or city officials, not GIS analysts)..Rmd) allow to create html files out of our R work in (relatively) easy and intuitve ways, and they do not require an understanding of html, css, etc.Examples:
Before we begin today’s tutorial:
blogdown package on Githubknitr package, which is the core element of R markdown.rmarkdown package, but you don’t need to explicitly install it or load it, as RStudio automatically does both when needed.Step 1. Create a new .Rmd file
.Rmd file. To create a new .Rmd file, go to File > New File > R Markdown on the menu, accept the default setting (by cliking yes), and delete all default texts..Rmd file, save it.Step 2. check the individual elements of the sample code:
---s.```.# heading and _italics_.In the .Rmd file, code and output are interleaved. You can run (part of) in a few different ways:
Step 3. Publish an html file online
Wait, what’s going on under the hood?
.Rmd file to knitr, http://yihui.name/knitr/, which executes all of the code chunks and creates a new markdown (.md) document which includes the code and its output.knitr is then processed by pandoc, http://pandoc.org/, which is responsible for creating the finished file..Rmd files is written in Markdown, a lightweight set of conventions for formatting plain text files.# this block is written inside a code chunk, to avoid actual effects
# these tricks work outside of the code chunk
Text formatting
------------------------------------------------------------
*italic* or _italic_
**bold** __bold__
`code`
superscript^2^ and subscript~2~
Headings
------------------------------------------------------------
# 1st Level Header
## 2nd Level Header
### 3rd Level Header
Lists
------------------------------------------------------------
* Bulleted list item 1
* Item 2
* Item 2a
* Item 2b
1. Numbered list item 1
1. Item 2. The numbers are incremented automatically in the output.
Links and images
------------------------------------------------------------
<http://example.com>
[linked phrase](http://example.com)

Tables
------------------------------------------------------------
First Header | Second Header
------------- | -------------
Content Cell | Content Cell
Content Cell | Content Cell
Exercise
Practice what you’ve learned by creating a brief CV. The title should be your name, and you should include headings for (at least) education or employment. Each of the sections should include a bulleted list of jobs/degrees. Highlight the year in bold.
```{r} (to open a code chunk) and ``` (to close the chunk).Chunk Name
by-name in the openning line ```{r by-name}.Chunk Options
Knitr provides almost 60 options that you can use to customize your code chunks.eval = FALSE prevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.
include = FALSE runs the code, but doesn’t show the code or results in the final document. Use this for setup code that you don’t want cluttering your report.
echo = FALSE prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlying R code.
message = FALSE or warning = FALSE prevents messages or warnings from appearing in the finished file.
results = 'hide' hides printed output; fig.show = 'hide' hides plots.
error = TRUE causes the render to continue even if code returns an error. This is rarely something you’ll want to include in the final version of your report, but can be very useful if you need to debug exactly what is going on inside your .Rmd. It’s also useful if you’re teaching R and want to deliberately include an error. The default, error = FALSE causes knitting to fail if there is a single error in the document.
Table
knitr::kable function.knitr::kable(
mtcars[1:5, ],
caption = "A knitr kable."
)
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
(Advanced) Caching
cache = TRUE. When set, this will save the output of the chunk to a specially named file on disk. On subsequent runs, knitr will check to see if the code has changed, and if it hasn’t, it will reuse the cached results.#```{r raw_data} <- the header of the first chunk
rawdata <- readr::read_csv("a_very_large_file.csv")
#```
#```{r processed_data, cache = TRUE} <- the header of the second chunk
processed_data <- rawdata %>%
filter(!is.na(import_var)) %>%
mutate(new_variable = complicated_transformation(x, y, z))
#```
processed_data chunk means that it will get re-run if the dplyr pipeline is changed, but it won’t get rerun if the read_csv() call changes. Then, how to avoid?dependson should contain a character vector of every chunk that the cached chunk depends on.Knitr will update the results for the cached chunk whenever it detects that one of its dependencies have changed.#```{r processed_data, cache = TRUE, dependson = "raw_data"} <- the new header of the second chunk
processed_data <- rawdata %>%
filter(!is.na(import_var)) %>%
mutate(new_variable = complicated_transformation(x, y, z))
#```
a_very_large_file.csv, file changes, but not R scripts?cache.extra option is an R expression that will invalidate the cache whenever it changes.cache.extra with file.info(): it returns a bunch of information about the file including when it was last modified.#```{r raw_data, cache.extra = file.info("a_very_large_file.csv")} <- the new header of the first chunk
rawdata <- readr::read_csv("a_very_large_file.csv")
#```
knitr::clean_cache().dependson specification.Global Options
You can choose global options in the first place, which applies to all code chunks, unless each code chunk is specified differently.
When you want code and output kept closely to each other,
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE
)
knitr::opts_chunk$set(
echo = FALSE
)
Inline Code
R code into an R Markdown document: directly into the text, with: `r `..Rmd file translates to the following one sentence. FYI, in the first quote, I used a single quotation mark, not backtick, to avoid actual effects.We have data about ‘r nrow(diamonds)’ diamonds. Only ‘r nrow(diamonds) - nrow(smaller)’ are larger than 2.5 carats. The distribution of the remainder is shown below:
We have data about 53940 diamonds. Only 126 are larger than 2.5 carats. The distribution of the remainder is shown below:
format() is your friend.comma <- function(x) format(x, digits = 2, big.mark = ",")
comma(3452345)
#> [1] "3,452,345"
comma(.12358124331)
#> [1] "0.12"
R environment.R, then “Run all chunks” (either from Code > Run region), or with the keyboard shortcut Ctrl + Alt + R. If you’re lucky, that will recreate the problem, and you can figure out what’s going on interactively.getwd() in a chunk.R session and your R markdown session. The easiest way to do that is to set error = TRUE on the chunk causing the problem, then use print() and str() to check that settings are as you expect.