5 - R Markdown basics

27.2.1 Exercises

1. Create a new notebook using File > New File > R Notebook. Read the instructions. Practice running the chunks. Verify that you can modify the code, re-run it, and see modified output.

2. Create a new R Markdown document with File > New File > R Markdown… Knit it by clicking the appropriate button. Knit it by using the appropriate keyboard short cut. Verify that you can modify the input and see the output update.

3. Compare and contrast the R notebook and R markdown files you created above. How are the outputs similar? How are they different? How are the inputs similar? How are they different? What happens if you copy the YAML header from one to the other?

The R notebook and R markdown files do create relatively similar output documents. Both files are used to consolidate the code while as seamlessly as possible weaving together the output and code syntax. The main difference is that R notebooks show the code chunks inside the editor, whereas R markdown files do not. Basically, R notebooks still use the side tab to show outputs, while R markdown files will show all of the output of the code below the chunk. R markdown files are also more versatile, as they are able to be saved as multiple different formats, where R notebooks can only be saved as HTML outputs. Because the headers vary between the two, copying the YAML header from one to the other will change the format to the header that has been newly pasted.

4. Create one new R Markdown document for each of the three built-in formats: HTML, PDF and Word. Knit each of the three documents. How does the output differ? How does the input differ? (You may need to install LaTeX in order to build the PDF output — RStudio will prompt you if this is necessary.)

The obvious difference in the outputs is the various file types associated with each. The HTML output is simply a text document, the PDF is just a PDF, and the Word file actually opens in Microsoft Word. The plots also look different between the various file types and the aesthetic design is a little bit different between these types. For the input, the only difference is in the header. The “output” section differs between the three, with HTML being “html_document”, word being “word_document”, and PDF being “pdf_document”.

27.3.1 Exercises

1. Practice what you’ve learned by creating a brief CV. The title should be your name, and you should include headings for (at least) education or employment. Each of the sections should include a bulleted list of jobs/degrees. Highlight the year in bold.

Reilly Mach

Education

  • Scripps Research - 2022 entering class - PhD Student
  • Concordia College - 2022 - Bachelor’s of Arts
  • Davies High School - 2018 - High School Diploma

Employment

  • 2019-2020 - Research Associate, Concordia College
  • June-July 2021 - Summer Undergraduate Research Fellow, Scripps Research

2. Using the R Markdown quick reference, figure out how to: Add a footnote. 1

Add a horizontal rule.

Creating a horizontal rule is done by using 3 consecutive dashes, like so:


There is now a displayed line in the text.

Add a block quote

Adding a block quote is done using the “greater than” sign, which then indents the text as a block quote.

3.Copy and paste the contents of diamond-sizes.Rmd from https://github.com/hadley/r4ds/tree/master/rmarkdown in to a local R markdown document. Check that you can run it, then add text after the frequency polygon that describes its most striking features.

We have data about 53940 diamonds. Only 126 are larger than 2.5 carats. The distribution of the remainder is shown below:

In this frequency polygon, we see the number of diamonds by carat size. It seems as though the most diamonds are from lower carats, but there are still many spikes as we go higher in carats. It’s also important to note that there are spikes at every half-carat value, demonstrating that most diamonds are given either a whole- or half-number carat value, though there are smaller spikes corresponding to values between.

27.4.7 Exercises

1. Add a section that explores how diamond sizes vary by cut, colour, and clarity. Assume you’re writing a report for someone who doesn’t know R, and instead of setting echo = FALSE on each chunk, set a global option.

knitr::opts_chunk$set(echo = FALSE)

This graph demonstrates that diamonds of lower quality cuts tend to have higher carat values, meaning they are larger.

In the diamonds dataset, colors are ranked from D (best) to J (worst). As such, this graph demonstrates that diamonds of lower quality colors tend to be larger and have higher carat values.

Similarly to what has been shown in the previous graphs, diamonds of worse clarity also follow the trend of having a higher carat value.

2.Download diamond-sizes.Rmd from https://github.com/hadley/r4ds/tree/master/rmarkdown. Add a section that describes the largest 20 diamonds, including a table that displays their most important attributes.

Here we see a table that displays the largest 20 diamonds, along with their cut, color, and clarity attributes.

carat cut color clarity
5.01 Fair J I1
4.50 Fair J I1
4.13 Fair H I1
4.01 Premium I I1
4.01 Premium J I1
4.00 Very Good I I1
3.67 Premium I I1
3.65 Fair H I1
3.51 Premium J VS2
3.50 Ideal H I1
3.40 Fair D I1
3.24 Premium H I1
3.22 Ideal I I1
3.11 Fair J I1
3.05 Premium E I1
3.04 Very Good I SI2
3.04 Premium I SI2
3.02 Fair I I1
3.01 Premium I I1
3.01 Premium F I1

3.Modify diamonds-sizes.Rmd to use comma() to produce nicely formatted output. Also include the percentage of diamonds that are larger than 2.5 carats.

The dataset includes information from around 53,940 diamonds. Only 126 (0.2%) are larger than 2.5 carats.

4.Set up a network of chunks where d depends on c and b, and both b and c depend on a. Have each chunk print lubridate::now(), set cache = TRUE, then verify your understanding of caching.

The chunk a has no dependencies.

print(lubridate::now())
## [1] "2022-09-24 15:35:49 PDT"
x <- 1

The chunk b depends on a.

print(lubridate::now())
## [1] "2022-09-24 15:35:49 PDT"
y <- x + 1

The chunk c depends on a.

print(lubridate::now())
## [1] "2022-09-24 15:35:49 PDT"
z <- x * 2

The chunk d depends on c and b:

print(lubridate::now())
## [1] "2022-09-24 15:35:49 PDT"
w <- y + z

  1. this is my example of a footnote, done using a carat followed by bracketed text.↩︎