Hello !

This Data Bite session is to show you how to make Reproducible reports in RStudio.


Installation Instructions

Before working through the workshop materials, please do the following in preparation: 1. Open up RStudio.
2. Install and download the relevant R packages by running the following in the R console.

#install.packages(c("googleVis","pxR","scales","ggplot2","gridExtra","grid","png","plotly"))
library("googleVis")
library("png")
library("pxR")
library("scales")
library(gridExtra)
library(grid) 
#library(plotly)
  1. If your packages loaded without any errors, then you are ready for the tutorial!

References:

The following resources were of use in making this tutorial:
1. The RMarkdown website hosted by RStudio.
2. Getting Started with RMarkdown - Garrett Grolemund “https://www.youtube.com/watch?v=MIlzQpXlJNk
3. Creating Dynamic Documents with RMarkdown and Knitr - http://rpubs.com/marschmi/RMarkdown
4. Cheatsheets released by RStudio.

Dynamic Documents

Literate programming is the basic idea behind dynamic documents and was proposed by Donald Knuth in 1984. Originally, it was for mixing the source code and documentation of software development together. Today, we will create dynamic documents in which program or analysis code is run to produce output (e.g. tables, plots, models, etc) and then are explained through narrative writing. The 3 steps of Literate Programming:
1. Parse the source document and separate code from narratives.
2. Execute source code and return results.
3. Mix results from the source code with the original narratives.
So that leaves 2 steps for us which includes writing:
1. Analysis code
2. A narrative to explain the results from the analysis code.


Reproducible Research

Reproducible research is one possible product of dynamic documents, however, it is not guaranteed! Good practices for reproducible research include:
1. Encapsulate the full project into one directory that is supported with version control.
2. Release your code and data.
3. Document everything and use code as documentation!
4. Make figures, tables, and statistics the results of scripts and inline code.
5. Write code that uses relative paths.
6. Always Set your seed.
7. Always include session information in the code file. For example, you can use devtools::session_info(). To read more about reproducibility and data management check out Vince Buffalo’s Book[@Buffalo2015]. ******************************************************************************************

Markdown

To fully understand RMarkdown, we first need to cover Markdown, which is a system for writing simple, readable text that is easily converted to HTML. Markdown essentially is two things:
1. A plain text formatting syntax
2. A software tool written in Perl.
- Converts the plain text formatting into HTML.

Main goal of Markdown:
Make the syntax of the raw (pre-HTML) document as readable possible. Would you rather read this code in HTML?

<body>
  <section>
    <h1>Rock Climbing Packing List</h1>
    <ul>
      <li>Climbing Shoes</li>
      <li>Harness</li>
      <li>Backpack</li>
      <li>Rope</li>
      <li>Belayer</li>
    </ul>
  </section>
</body>

Or this code in Markdown?

# Rock Climbing Packing List
* Climbing Shoes
* Harness
* Backpack  
* Rope
* Belayer

If you are human, the Markdown code is definitely easier to read!


RMarkdown

RMarkdown is a variant of Markdown that makes it easy to create dynamic documents, presentations and reports within RStudio. It has embedded R code chunks to be used with knitr to make it easy to create reproducible (web-based) reports in the sense that they can be automatically regenerated when the underlying code it modified.
- RMarkdown lets you combine Markdown with images, links, tables, LaTeX, and actual code. - RStudio makes creating documents from RMarkdown easy
- RStudio (like R) is free and runs on any operating system. RMarkdown renders many different types of files including:
- HTML
- PDF
- Markdown
- Microsoft Word
- Presentations:
- Fancy HTML5 presentations:
- ioslides - Slidy
- Slidify - PDF Presentations:
- Beamer
- Handouts:
- Tufte Handouts - HTML R Package Vignettes
- Even Entire Websites!

While there are a lot of different types of rendered documents in RMarkdown, today we will focus primarily on HTML output files, as I have found these files to be the most useful and flexible for my research.

Why R Markdown?

A convenient tool for reproducible and dynamic reports!
- While it was created for R, it now accepts many programming languages. For simplicity, we will only work with R today.
- Execute code in a few ways:
1. Inline Code: Brief code that takes place during the written part of the document. 2. Code Chunks: Parts of the document that includes several lines of program or analysis code. It may render a plot or table, calculate summary statistics, load packages, etc.
- It is easy to:
- Embed images.
- Learn Markdown syntax.
- Include LaTeX equations.
- Include interactive tables. - Use version control with Git.
- Even easier to share and collaborate on analyses, projects and publications! - Add external links - Rmarkdown even understands some html code!
- Make beautifully formatted documents. - Do not need to worry about page breaks or figure placement.
- Consolidate your code and write up into a single file:
+ Slideshows, PDFs, html documents, word files
## Simple Workflow
Briefly, to make a report:
1. Open a .Rmd file.
- Create a YAML header (more on this in a minute!) 2. Write the content with RMarkdown syntax.
3. Embed the R code in code chunks or inline code.
4. Render the document output.

Overview of the steps RMarkdown takes to get to the rendered document:
1. Create .Rmd report that includes R code chunks and and markdown narratives (as indicated in steps above.).
2. Give the .Rmd file to knitr to execute the R code chunks and create a new .md file.
- Knitr is a package within R that allows the integration of R code into rendered RMarkdown documents such as HTML, latex, pdf, word, among other document types.
3. Give the .md file to pandoc, which will create the final rendered document (e.g. html, Microsoft word, pdf, etc.).
- Pandoc is a universal document converter and enables the conversion of one document type (in this case: .Rmd) to another (in this case: HTML) While this may seem complicated, we can hit the “Knit” button at the top of the page

Creating a .Rmd File

Let’s start working with RMarkdown! 1. In the menu bar, click File -> New File -> RMarkdown
- Or simply click on the green plus sign in the top left corner of RStudio.

  1. The window below will pop up.
  • Inside of this window, choose the type of output by selecting the radio buttons. Note: this output can be easily changed later!
  1. Click OK

YAML Headers

YAML stands for “YAML Ain’t Markup Language” and is basically a nested list structure that includes the metadata of the document. It is enclosed between two lines of three dashes --- and as we saw above is automatically written by RStudio. A simple example:

Note the Use of params:

---
title:  "My Reproducible Census Report"  
date: "October 18th, 2019"  
params:
  year: 2016
output:   
  word_document
---

The above example will create a word document. However, the following options are also available.
- html_document
- pdf_document
- beamer_presentation (pdf slideshow)
- ioslides_presentation (HTML slideshow)
- and more…

Today, we will be focused on word files. However,in your own time please feel free to play around with creating word and pdf documents. Presentation slides take on a slightly different syntax (e.g. to specify when one slide ends and the next one starts) and so there is a bit of markdown syntax specific to presentations that are beyond the focus of this tutorial

Markdown Basics

Check out the RMarkdown Reference Guide

Mardown Quick Reference Guide

See help menu

Inline R Code

Inline code is created by using a back tick (`) and the letter r followed by another back tick.
- For example: 211 is 2048.
Imagine that you’re reporting a p-value and you do not want to go back and add it every time the statistical test is

R Code Chunks

R code chunks can be used to render R output into documents or to display code for illustration.

The Anatomy of a code chunk:

To insert an R code chunk, you can type it manually by typing ```{r} followed by ``` on the next line. You can also press the Insert a new code chunk button or use the shortcut key. This will produce the following code chunk:

```{r}
n <- 10
seq(n)
```

Name the code chunk something meaningful as to what it is doing. Below I have named the code chunk 10-random-numbers:

```{r 10-random-numbers}
n <- 10
seq(n)
```

The code chunk input and output is then displayed as follows:

n = 10
seq(n)
##  [1]  1  2  3  4  5  6  7  8  9 10

Knitr

Knitr is an R-Package that works with

  1. Identifies code including chunks and inline
  2. Evaluates all the code and returns the results
  3. Renders a formatted results and combines with original file.

Knitr runs code as if it were being run in the R console.

Mainly Knitr works with code chunks.

A code chunk looks like:

```r
x <- rnorm(100)  
y <- 2*x + rnorm(100)
```

Best practices regarding code chunks:

  1. Always name/label your code chunks!
  2. Instead of specifying the chunk options in every chunk, set the global chunk options at the beginning of the document. More on this in a minute!

Chunk Labels

Chunk labels must be unique IDs in a document and are good for:

  • Generating external files such as images and cached documents.
  • Chunk labels often are output when errors arise (more often for line of code).
  • Navigating throughout long .Rmd documents.

When naming the code chunk: Use - or _ in between words for code chunks labels instead of spaces. This will help you and other users of your document to navigate through.

Chunk labels must be unique throughout the document - otherwise there will be an error!

Set up the packages, data and code that we will use to prepare the report

Get the CSO logo

CSO_img_url <- "https://github.com/MervynOLuing/TanzaniaShapefiles/raw/master/logo.png"
z <- tempfile()
download.file(CSO_img_url,z,mode="wb")
img <- readPNG(z)
file.remove(z)

Get the Census data from the Statbank

EP002<-read.px("https://www.cso.ie/px/pxeirestat/Database/eirestat/Preliminary%20Results%20(July%202016)/EP002.px")
EP002df<-as.data.frame(EP002)

Optional: save file for future use

#write.csv(EP002df,"~/EP002df.csv")
#setwd('C:/Users/mlol1/Documents/census/')
#EP002 <- read.csv("~/EP002df.csv")
#EP002df<-as.data.frame(EP002)

We now need to start manipulating the data

select a subset of the data with the Statistic that we are interested in for this report “Population (Number)”

pop<-subset(EP002df,  EP002df$Statistic == "Population  (Number)" )

Next step: take a subet of this file at the State level only

state<-subset( pop,  pop$Province.County.or.City == "State" )

Take the population total for the State. Note how we use params$year

population<-subset( state,  state$CensusYear == params$year)$value

You will see what we use population for when we start to compile the report.

Now we need information on the population change so we proceed as above, except we take the “Actual change since previous census (Number)” statistic:

change<-subset(EP002df,EP002df$Statistic=="Actual change since previous census (Number)")

This gives a subset with the actual changes since the previous census for the State and Provinces, but we only need the State for this report.

change_state<-subset( change,  change$Province.County.or.City == "State" )

We already established what the Census year of the report should be, so now we need to get the previous census year (relative to params$year.

previous_year<-as.numeric(levels(state$CensusYear)[which(state$CensusYear==params$year)-1])

Next we get a subset with the percentage change from the previous Census.

per_change_pop<-subset(EP002df,  EP002df$Statistic == "Percentage change since previous census (%)")

As before we jsut need to information for the State.

per_change_state<-subset(per_change_pop, per_change_pop$Province.County.or.City == "State" )

Next we get the percentage value.

Percentage<-subset(per_change_state,  per_change_state$CensusYear == params$year)$value

Now we need to get the information for the antepenultimate Census (two Censuses prior to the current).

second_previous_year<-as.numeric(levels(state$CensusYear)[which(state$CensusYear==params$year)-2])
second_percentage<-subset(per_change_state,  per_change_state$CensusYear == second_previous_year)$value

Next we get the information for 20 years ago - which (normally is 4 Censuses ago.)

twenty_years_before<-as.numeric(levels(state$CensusYear)[which(state$CensusYear==params$year)-4])
twenty_years_before_pop<-subset( state,  state$CensusYear == twenty_years_before)$value

Now we need to get information for 60 years ago (again relative to the Census year in question). Normally this corresponds to 12 Censuses.

sixty_years_before<-as.numeric(levels(state$CensusYear)[which(state$CensusYear==params$year)-12])
sixty_years_before_pop<-subset( state,  state$CensusYear == sixty_years_before)$value

Now we create a table for the population change from the Previous Census, over 60 years.

table_prep_pop_state<-state[sort(which(state$CensusYear==params$year):(which(state$CensusYear==params$year)-12)),]
table_prep_pop_change<-change_state[sort(which(change_state$CensusYear==params$year):(which(change_state$CensusYear==params$year)-12)),]
years<-as.numeric(as.character(state[sort(which(state$CensusYear==params$year):(which(state$CensusYear==params$year)-12)),]$CensusYear))
change<-as.numeric(as.character(per_change_state[sort(which(per_change_state$CensusYear==params$year):(which(per_change_state$CensusYear==params$year)-12)),]$value))
table<- cbind(years,prettyNum((table_prep_pop_state$value),big.mark=",", preserve.width="none"),prettyNum((table_prep_pop_change$value),big.mark=",", preserve.width="none"),change )
colnames(table)<-c("Census Year", "Population", "Change", "%")

table<-as.data.frame(table)

Ok now are ready to ‘populate’ our report

First thing to do is put the CSO logo at the top of the report.

Now the text:

```

Total population report for r params$year

  • The r ifelse(params$year==2016, "preliminary", "") population count in r params$year was r prettyNum(population,big.mark=",", preserve.width="none").

  • Census r params$year results show that Ireland’s population r ifelse(year_change > 0, "increased", "decreased") by r prettyNum(abs(year_change),big.mark=",", preserve.width="none") persons since Census r previous_year to r prettyNum(population ,big.mark=",", preserve.width="none") persons.

  • This represents an increase of r Percentage per cent over the r params$year - previous_year year intercensal period, an annual average increase of r Percentage/(params$year - previous_year) per cent.

  • The previous annual average increase between Census r second_previous_year and Census r previous_year was r second_percentage/(previous_year-second_previous_year) per cent.

  • Looking back over r params$year - twenty_years_before years to r twenty_years_before Ireland’s population has r ifelse((population-twenty_years_before_pop) > 0, "increased", "decreased") by r prettyNum((population-twenty_years_before_pop),big.mark=",", preserve.width="none") persons, or r prettyNum(percent((population-twenty_years_before_pop)/twenty_years_before_pop),big.mark=",", preserve.width="none") per cent.

  • Over the past r params$year - sixty_years_before years from r sixty_years_before, the population has r ifelse((population-sixty_years_before_pop) > 0, "increased", "decreased") by r prettyNum((population-sixty_years_before_pop),big.mark=",", preserve.width="none") persons or r prettyNum(percent((population-sixty_years_before_pop)/sixty_years_before_pop),big.mark=",", preserve.width="none") per cent which is illustrated in the chart below.

Table A Population from previous Census r sixty_years_before-r params$year

```{r, echo=FALSE, message=FALSE,warning=FALSE} tt3 <- ttheme_minimal( core=list(bg_params = list(fill = blues9[1:4], col=NA), fg_params=list(fontface=3)), colhead=list(fg_params=list(col=“navyblue”, fontface=4L)), rowhead=list(fg_params=list(col=“orange”, fontface=3L))) grid.arrange(tableGrob(table, theme=tt3))

```

History lessons

  • The graph below charts the inter-censal population change going back to r sixty_years_before.

  • Ireland’s population was at its lowest level in 1961 at 2.8 million, having fallen by 142,252 in the preceding decade. Thereafter it has grown in each decade through a combination of natural increase and declining net outward migration.

  • The very high increase for 1979 reflects both a period of high net inward migration, increasing births and the longer period covered (Census 1976 was cancelled for budgetary reasons).

  • This report is based on that found in page 9 of This is Ireland - Highlights from Census 2011, Part 1 which is available here: (https://www.cso.ie/en/media/csoie/census/documents/census2011pdr/Census_2011_Highlights_Part_1_web_72dpi.pdf).

```{r, echo=FALSE}

p <- table %>% ggplot( aes(as.factor(table\(`Census Year`), table_prep_pop_change\)value ,group = 1)) + # geom_area(fill=“#69b3a2”, alpha=0.5)+ geom_bar(stat=“identity”) + #geom_line(color=“#69b3a2”) + xlab(“Census Year”)+ ylab(“Thousands”) + ggtitle(paste0(“Figure 1 Inter-censal change”,sixty_years_before,“-”,params$year)) + theme(panel.background = element_rect(fill = “white”))+ scale_y_continuous(breaks=seq(-100000,450000,by=50000))

p


create a report for a different year.

library(rmarkdown)
render("Report.Rmd",params = list("year"=2011))

Helpful Hints:
- End a line with two spaces to start a new paragraph.
- Words formatted like code should be surrounded by back ticks on both sides: - To make something superscript surround it withon each side. Super^script^ was created by typingSuperscript^. - Equations can be inline code using\(` and centered as a blocked equation within the document with `\)$. For example $E = mc^2$ is inline while the following is a blocked equation is: $$E = mc^2$$ - **Note:** To make it superscript with\(` and `\)\(` a `^` is needed before each alphanumeric that is superscript. - Other fun math stuff: - Square root: `\)$will create $\sqrt{b}$ - Fractions:\(\frac{1}{2}\)= $\frac{1}{2}$ - - Fractional Equations:\(f(x)=\frac{P(x)}{Q(x)}\)= $f(x)=\frac{P(x)}{Q(x)}$ - Binomial Coefficients:\(\binom{k}{n}\)= $\binom{k}{n}$ - Integrals:\[\int_{a}^{b} x^2 dx\]` = \[\int_{a}^{b} x^2 dx\] - ShareLaTeX is an awesome source for LaTeX code.