Dynamic Documents For Your Research Workflow

Fernando Hoces de la Guardia
BITSS
-
Slides at https://goo.gl/ZFQvba

BITSS Annual Meeting, December 2017

Dynamic Documents For Computational Reproducibility

First: can you take this quick survey please?

Dynamic Documents For Computational Reproducibility

Currently code and narrative components live in separate universes

Dynamic Documents: integrate the two universes!

Dynamic Documents: A Reciepe

One Type of Dynamic Document: R Markdown

For our excercise: R Markdown

R Markdown

Basic Structure

Basic Structure: Header

---
title: "Sample Paper"
author: "Fernando Hoces de la Guardia"
output: html_document
---

Basic Structure: Body of Text

---
header
---

This is where you write your paper. Nothing much to add. You can check Markdown syntax here. And it can use can type equations using LaTex syntax!

Basic Structure: Code Chunks and Inline

---
header
---

Body of text.

To begin a piece of code (“code chunk”). Enclose them in the following expression (Ctrl/Cmd + shift/optn + i)

```{r, eval=TRUE}
here goes the code
```

To write inline use only one Backtick to open followed by an “r”" and one to close `r 1+1` in the output.

Practical Excercise

Hands-on excercise: the birthday problem!

As an illustration lets write a report using the participants in this workshop to illustrate the famous birthday problem.

What is the probability that at least two people this room share the same birthday?

Is it something like \(\frac{1}{365} \times N =\) 0.11?

Create a new RMarkdown File

1 - In RStudio: File-> New File -> RMarkdown...
2 - Name it, and save it.
3 - Review/edit the header, and delete all the default body of text except for one code chunk.
4 - Define a seed (set.seed = 1234 and number of people in the room (n.pers = ?)

The birthday problem: the math

Actually the math says otherwise: \[\begin{align} 1 - \bar p(n) &= 1 \times \left(1-\frac{1}{365}\right) \times \left(1-\frac{2}{365}\right) \times \cdots \times \left(1-\frac{n-1}{365}\right) \nonumber \\ &= \frac{ 365 \times 364 \times \cdots \times (365-n+1) }{ 365^n } \nonumber \\ &= \frac{ 365! }{ 365^n (365-n)!} = \frac{n!\cdot\binom{365}{n}}{365^n}\\ p(n= 40) &= 0.891 \nonumber \end{align}\]

Code for the math (https://goo.gl/ZFQvba)

Don’t look at this: just copy and paste into your report

\begin{align} 
 1 - \bar p(n) &= 1 \times \left(1-\frac{1}{365}\right) 
 \times \left(1-\frac{2}{365}\right) \times \cdots \times 
 \left(1-\frac{n-1}{365}\right) \nonumber  \\  
 &= \frac{ 365 \times 364 \times \cdots \times 
   (365-n+1) }{ 365^n } \nonumber \\ 
 &= \frac{ 365! }{ 365^n (365-n)!} = 
   \frac{n!\cdot\binom{365}{n}}{365^n}\\
p(n= `r n.pers`) &= `r  
 round(1 - factorial(n.pers) * 
         choose(365,n.pers)/ 365^n.pers, 3)`\nonumber
\end{align}

Don’t like math? Let’s run a simple simulation!

1 - Simulate 10,000 rooms with \(n = 40\) random birthdays, and store the results in matrix where each row represents a room.
2 - For each room (row) compute the number of unique birthdays.
3 - Compute the average number of times a room has 40 unique birthdays, across 10,000 simulations, and report the complement.

Code for the simulation (https://goo.gl/ZFQvba)

birthday.prob = function(n.pers, n.sims) {
  # simulate birthdays
  birthdays = matrix(round(runif(n.pers * n.sims, 1, 365)), 
                      nrow = n.sims, ncol = n.pers)
  # for each room (row) get unique birthdays
  unique.birthdays = apply(birthdays, 1, unique)
  # Indicator with 1 if all are unique birthdays
  all.different = (lapply(unique.birthdays, length) == n.pers)
  # Compute average time all have different birthdays 
  result = 1 - mean(all.different)
return(result)
}
n.pers.param = 43
n.sims.param = 1e4
birthday.prob(n.pers.param,n.sims.param)
## [1] 0.9274

Results

Final Remarks & More Resources

Final Remarks & More Resources