This Data Bite session is to show you how to make Reproducible reports in RStudio.
Before working through the workshop materials, please do the following in preparation: 1. Open up RStudio.
2. Install and download the relevant R packages by running the following in the R console.
#install.packages(c("googleVis","pxR","scales","ggplot2","gridExtra","grid","png","plotly"))
library("googleVis")
library("png")
library("pxR")
library("scales")
library(gridExtra)
library(grid)
#library(plotly)
The following resources were of use in making this tutorial:
1. The RMarkdown website hosted by RStudio.
2. Getting Started with RMarkdown - Garrett Grolemund “https://www.youtube.com/watch?v=MIlzQpXlJNk”
3. Creating Dynamic Documents with RMarkdown and Knitr - http://rpubs.com/marschmi/RMarkdown
4. Cheatsheets released by RStudio.
Literate programming is the basic idea behind dynamic documents and was proposed by Donald Knuth in 1984. Originally, it was for mixing the source code and documentation of software development together. Today, we will create dynamic documents in which program or analysis code is run to produce output (e.g. tables, plots, models, etc) and then are explained through narrative writing. The 3 steps of Literate Programming:
1. Parse the source document and separate code from narratives.
2. Execute source code and return results.
3. Mix results from the source code with the original narratives.
So that leaves 2 steps for us which includes writing:
1. Analysis code
2. A narrative to explain the results from the analysis code.
Reproducible research is one possible product of dynamic documents, however, it is not guaranteed! Good practices for reproducible research include:
1. Encapsulate the full project into one directory that is supported with version control.
2. Release your code and data.
3. Document everything and use code as documentation!
4. Make figures, tables, and statistics the results of scripts and inline code
.
5. Write code that uses relative paths.
6. Always Set your seed.
7. Always include session information in the code file. For example, you can use devtools::session_info()
. To read more about reproducibility and data management check out Vince Buffalo’s Book[@Buffalo2015]. ******************************************************************************************
To fully understand RMarkdown, we first need to cover Markdown, which is a system for writing simple, readable text that is easily converted to HTML. Markdown essentially is two things:
1. A plain text formatting syntax
2. A software tool written in Perl.
- Converts the plain text formatting into HTML.
Main goal of Markdown:
Make the syntax of the raw (pre-HTML) document as readable possible. Would you rather read this code in HTML?
<body>
<section>
<h1>Rock Climbing Packing List</h1>
<ul>
<li>Climbing Shoes</li>
<li>Harness</li>
<li>Backpack</li>
<li>Rope</li>
<li>Belayer</li>
</ul>
</section>
</body>
Or this code in Markdown?
# Rock Climbing Packing List
* Climbing Shoes
* Harness
* Backpack
* Rope
* Belayer
If you are human, the Markdown code is definitely easier to read!
RMarkdown is a variant of Markdown that makes it easy to create dynamic documents, presentations and reports within RStudio. It has embedded R code chunks to be used with knitr to make it easy to create reproducible (web-based) reports in the sense that they can be automatically regenerated when the underlying code it modified.
- RMarkdown lets you combine Markdown with images, links, tables, LaTeX, and actual code. - RStudio makes creating documents from RMarkdown easy
- RStudio (like R) is free and runs on any operating system. RMarkdown renders many different types of files including:
- HTML
- PDF
- Markdown
- Microsoft Word
- Presentations:
- Fancy HTML5 presentations:
- ioslides - Slidy
- Slidify - PDF Presentations:
- Beamer
- Handouts:
- Tufte Handouts - HTML R Package Vignettes
- Even Entire Websites!
While there are a lot of different types of rendered documents in RMarkdown, today we will focus primarily on HTML output files, as I have found these files to be the most useful and flexible for my research.
A convenient tool for reproducible and dynamic reports!
- While it was created for R, it now accepts many programming languages. For simplicity, we will only work with R today.
- Execute code in a few ways:
1. Inline Code: Brief code that takes place during the written part of the document. 2. Code Chunks: Parts of the document that includes several lines of program or analysis code. It may render a plot or table, calculate summary statistics, load packages, etc.
- It is easy to:
- Embed images.
- Learn Markdown syntax.
- Include LaTeX equations.
- Include interactive tables. - Use version control with Git.
- Even easier to share and collaborate on analyses, projects and publications! - Add external links - Rmarkdown even understands some html code!
- Make beautifully formatted documents. - Do not need to worry about page breaks or figure placement.
- Consolidate your code and write up into a single file:
+ Slideshows, PDFs, html documents, word files
## Simple Workflow
Briefly, to make a report:
1. Open a .Rmd
file.
- Create a YAML header (more on this in a minute!) 2. Write the content with RMarkdown syntax.
3. Embed the R code in code chunks or inline code.
4. Render the document output.
Overview of the steps RMarkdown takes to get to the rendered document:
1. Create .Rmd
report that includes R code chunks and and markdown narratives (as indicated in steps above.).
2. Give the .Rmd
file to knitr
to execute the R code chunks and create a new .md
file.
- Knitr is a package within R that allows the integration of R code into rendered RMarkdown documents such as HTML, latex, pdf, word, among other document types.
3. Give the .md
file to pandoc, which will create the final rendered document (e.g. html, Microsoft word, pdf, etc.).
- Pandoc is a universal document converter and enables the conversion of one document type (in this case: .Rmd
) to another (in this case: HTML) While this may seem complicated, we can hit the “Knit” button at the top of the page
.Rmd
FileLet’s start working with RMarkdown! 1. In the menu bar, click File -> New File -> RMarkdown
- Or simply click on the green plus sign in the top left corner of RStudio.
YAML stands for “YAML Ain’t Markup Language” and is basically a nested list structure that includes the metadata of the document. It is enclosed between two lines of three dashes ---
and as we saw above is automatically written by RStudio. A simple example:
Note the Use of params:
---
title: "My Reproducible Census Report"
date: "October 18th, 2019"
params:
year: 2016
output:
word_document
---
The above example will create a word document. However, the following options are also available.
- html_document
- pdf_document
- beamer_presentation
(pdf slideshow)
- ioslides_presentation
(HTML slideshow)
- and more…
Today, we will be focused on word files. However,in your own time please feel free to play around with creating word and pdf documents. Presentation slides take on a slightly different syntax (e.g. to specify when one slide ends and the next one starts) and so there is a bit of markdown syntax specific to presentations that are beyond the focus of this tutorial
Check out the RMarkdown Reference Guide
See help menu
Inline code is created by using a back tick (`) and the letter r followed by another back tick.
- For example: 211 is 2048.
Imagine that you’re reporting a p-value and you do not want to go back and add it every time the statistical test is
R code chunks can be used to render R output into documents or to display code for illustration.
The Anatomy of a code chunk:
To insert an R code chunk, you can type it manually by typing ```{r}
followed by ```
on the next line. You can also press the Insert a new code chunk
button or use the shortcut key. This will produce the following code chunk:
```{r}
n <- 10
seq(n)
```
Name the code chunk something meaningful as to what it is doing. Below I have named the code chunk 10-random-numbers
:
```{r 10-random-numbers}
n <- 10
seq(n)
```
The code chunk input and output is then displayed as follows:
n = 10
seq(n)
## [1] 1 2 3 4 5 6 7 8 9 10
Knitr is an R-Package that works with
Knitr runs code as if it were being run in the R console.
Mainly Knitr works with code chunks.
A code chunk looks like:
```r
x <- rnorm(100)
y <- 2*x + rnorm(100)
```
Best practices regarding code chunks:
Chunk labels must be unique IDs in a document and are good for:
.Rmd
documents.When naming the code chunk: Use -
or _
in between words for code chunks labels instead of spaces. This will help you and other users of your document to navigate through.
Chunk labels must be unique throughout the document - otherwise there will be an error!
Get the CSO logo
CSO_img_url <- "https://github.com/MervynOLuing/TanzaniaShapefiles/raw/master/logo.png"
z <- tempfile()
download.file(CSO_img_url,z,mode="wb")
img <- readPNG(z)
file.remove(z)
Get the Census data from the Statbank
EP002<-read.px("https://www.cso.ie/px/pxeirestat/Database/eirestat/Preliminary%20Results%20(July%202016)/EP002.px")
EP002df<-as.data.frame(EP002)
Optional: save file for future use
#write.csv(EP002df,"~/EP002df.csv")
#setwd('C:/Users/mlol1/Documents/census/')
#EP002 <- read.csv("~/EP002df.csv")
#EP002df<-as.data.frame(EP002)
select a subset of the data with the Statistic that we are interested in for this report “Population (Number)”
pop<-subset(EP002df, EP002df$Statistic == "Population (Number)" )
Next step: take a subet of this file at the State level only
state<-subset( pop, pop$Province.County.or.City == "State" )
Take the population total for the State. Note how we use params$year
population<-subset( state, state$CensusYear == params$year)$value
You will see what we use population for when we start to compile the report.
Now we need information on the population change so we proceed as above, except we take the “Actual change since previous census (Number)” statistic:
change<-subset(EP002df,EP002df$Statistic=="Actual change since previous census (Number)")
This gives a subset with the actual changes since the previous census for the State and Provinces, but we only need the State for this report.
change_state<-subset( change, change$Province.County.or.City == "State" )
We already established what the Census year of the report should be, so now we need to get the previous census year (relative to params$year.
previous_year<-as.numeric(levels(state$CensusYear)[which(state$CensusYear==params$year)-1])
Next we get a subset with the percentage change from the previous Census.
per_change_pop<-subset(EP002df, EP002df$Statistic == "Percentage change since previous census (%)")
As before we jsut need to information for the State.
per_change_state<-subset(per_change_pop, per_change_pop$Province.County.or.City == "State" )
Next we get the percentage value.
Percentage<-subset(per_change_state, per_change_state$CensusYear == params$year)$value
Now we need to get the information for the antepenultimate Census (two Censuses prior to the current).
second_previous_year<-as.numeric(levels(state$CensusYear)[which(state$CensusYear==params$year)-2])
second_percentage<-subset(per_change_state, per_change_state$CensusYear == second_previous_year)$value
Next we get the information for 20 years ago - which (normally is 4 Censuses ago.)
twenty_years_before<-as.numeric(levels(state$CensusYear)[which(state$CensusYear==params$year)-4])
twenty_years_before_pop<-subset( state, state$CensusYear == twenty_years_before)$value
Now we need to get information for 60 years ago (again relative to the Census year in question). Normally this corresponds to 12 Censuses.
sixty_years_before<-as.numeric(levels(state$CensusYear)[which(state$CensusYear==params$year)-12])
sixty_years_before_pop<-subset( state, state$CensusYear == sixty_years_before)$value
Now we create a table for the population change from the Previous Census, over 60 years.
table_prep_pop_state<-state[sort(which(state$CensusYear==params$year):(which(state$CensusYear==params$year)-12)),]
table_prep_pop_change<-change_state[sort(which(change_state$CensusYear==params$year):(which(change_state$CensusYear==params$year)-12)),]
years<-as.numeric(as.character(state[sort(which(state$CensusYear==params$year):(which(state$CensusYear==params$year)-12)),]$CensusYear))
change<-as.numeric(as.character(per_change_state[sort(which(per_change_state$CensusYear==params$year):(which(per_change_state$CensusYear==params$year)-12)),]$value))
table<- cbind(years,prettyNum((table_prep_pop_state$value),big.mark=",", preserve.width="none"),prettyNum((table_prep_pop_change$value),big.mark=",", preserve.width="none"),change )
colnames(table)<-c("Census Year", "Population", "Change", "%")
table<-as.data.frame(table)
First thing to do is put the CSO logo at the top of the report.
Now the text:
```
r params$year
The r ifelse(params$year==2016, "preliminary", "")
population count in r params$year
was r prettyNum(population,big.mark=",", preserve.width="none")
.
Census r params$year
results show that Ireland’s population r ifelse(year_change > 0, "increased", "decreased")
by r prettyNum(abs(year_change),big.mark=",", preserve.width="none")
persons since Census r previous_year
to r prettyNum(population ,big.mark=",", preserve.width="none")
persons.
This represents an increase of r Percentage
per cent over the r params$year - previous_year
year intercensal period, an annual average increase of r Percentage/(params$year - previous_year)
per cent.
The previous annual average increase between Census r second_previous_year
and Census r previous_year
was r second_percentage/(previous_year-second_previous_year)
per cent.
Looking back over r params$year - twenty_years_before
years to r twenty_years_before
Ireland’s population has r ifelse((population-twenty_years_before_pop) > 0, "increased", "decreased")
by r prettyNum((population-twenty_years_before_pop),big.mark=",", preserve.width="none")
persons, or r prettyNum(percent((population-twenty_years_before_pop)/twenty_years_before_pop),big.mark=",", preserve.width="none")
per cent.
Over the past r params$year - sixty_years_before
years from r sixty_years_before
, the population has r ifelse((population-sixty_years_before_pop) > 0, "increased", "decreased")
by r prettyNum((population-sixty_years_before_pop),big.mark=",", preserve.width="none")
persons or r prettyNum(percent((population-sixty_years_before_pop)/sixty_years_before_pop),big.mark=",", preserve.width="none")
per cent which is illustrated in the chart below.
r sixty_years_before
-r params$year
```{r, echo=FALSE, message=FALSE,warning=FALSE} tt3 <- ttheme_minimal( core=list(bg_params = list(fill = blues9[1:4], col=NA), fg_params=list(fontface=3)), colhead=list(fg_params=list(col=“navyblue”, fontface=4L)), rowhead=list(fg_params=list(col=“orange”, fontface=3L))) grid.arrange(tableGrob(table, theme=tt3))
```
The graph below charts the inter-censal population change going back to r sixty_years_before
.
Ireland’s population was at its lowest level in 1961 at 2.8 million, having fallen by 142,252 in the preceding decade. Thereafter it has grown in each decade through a combination of natural increase and declining net outward migration.
The very high increase for 1979 reflects both a period of high net inward migration, increasing births and the longer period covered (Census 1976 was cancelled for budgetary reasons).
This report is based on that found in page 9 of This is Ireland - Highlights from Census 2011, Part 1 which is available here: (https://www.cso.ie/en/media/csoie/census/documents/census2011pdr/Census_2011_Highlights_Part_1_web_72dpi.pdf).
```{r, echo=FALSE}
p <- table %>% ggplot( aes(as.factor(table\(`Census Year`), table_prep_pop_change\)value ,group = 1)) + # geom_area(fill=“#69b3a2”, alpha=0.5)+ geom_bar(stat=“identity”) + #geom_line(color=“#69b3a2”) + xlab(“Census Year”)+ ylab(“Thousands”) + ggtitle(paste0(“Figure 1 Inter-censal change”,sixty_years_before,“-”,params$year)) + theme(panel.background = element_rect(fill = “white”))+ scale_y_continuous(breaks=seq(-100000,450000,by=50000))
p
library(rmarkdown)
render("Report.Rmd",params = list("year"=2011))
Helpful Hints:
- End a line with two spaces to start a new paragraph.
- Words formatted like code should be surrounded by back ticks on both sides: - To make something superscript surround it with
on each side. Super^script^ was created by typing
Superscript^. - Equations can be inline code using
\(` and centered as a blocked equation within the document with `\)$. For example $E = mc^2$ is inline while the following is a blocked equation is: $$E = mc^2$$ - **Note:** To make it superscript with
\(` and `\)\(` a `^` is needed before each alphanumeric that is superscript. - Other fun math stuff: - Square root: `\)$will create $\sqrt{b}$ - Fractions:
\(\frac{1}{2}\)= $\frac{1}{2}$ - - Fractional Equations:
\(f(x)=\frac{P(x)}{Q(x)}\)= $f(x)=\frac{P(x)}{Q(x)}$ - Binomial Coefficients:
\(\binom{k}{n}\)= $\binom{k}{n}$ - Integrals:
\[\int_{a}^{b} x^2 dx\]` = \[\int_{a}^{b} x^2 dx\] - ShareLaTeX is an awesome source for LaTeX code.