RNotebooks - The Powerful Polyglot

Derek Slone-Zhen
Wednesday, 12th April, 2017




The Origin And Ecosystem of RNotebooks

RNotebooks are part of the rich tapestry of Dynamic Documents that are available in the R ecosystem.

plot of chunk unnamed-chunk-2

Markdown

Markdown was first released in 2004 by John Gruber and Aaron Swartz

It was designed firstly to be human-readable, and secondly to be machine readable. Like YAML in the world of configuration files, Markdown was a response to the failed aspirations of XML to be a human-readable language.

It has become a de-facto standard in many environments, including GitHub, and is available either out-of-the box or as a plug in for many wikis and issue tracking systems.

Quick Markdown Example

RMarkdown

The ability to insert chunks of R code into vanilla Markdown.

Converted by the knitr package, created by Yihui Xie.

http://rmarkdown.rstudio.com/

RMarkdown and knitr have been working in this world for for over 4 years now.

Example

Now, let me show you a plot:

plot(cars)

plot of chunk unnamed-chunk-3

RMarkdown Engines

- the Polyglot Appears

Supported Languages

In addition to R code, the knitr package support additional language engines, and allows you to add your own too. (We'll show that later.)

Most of the language you know and love are available (see right).

  • python
  • ruby
  • bash
  • dot
  • awk
  • sql
  • Scala

Execution Contexts & Persistence

However, unlike the R chunks, other language chunks get run in individual execution contexts. For R chunks, a single R session runs in the background and all R chunks are executed within that context, allowing for variable sharing and building up an environment. For other language, an interpreter (or other execution context) is launched for each chunk. This means that variables will have to be persisted by hand, typically through the file system.

Here's a quick example in Perl:

use strict;
my $a = 5;
while($a > 0) {
    print "Hello-#$a\n";
    $a--;
}
Hello-#5
Hello-#4
Hello-#3
Hello-#2
Hello-#1

RMarkdown Engine Languages

Python

Simply works.

I've had success with with Anaconda and babun and cygwin should work too.

But, as for the much talked-about feather package that promises efficient file-inter-op between R and python; I totally failed to get that python module to install on Windows 10 Home. It requires a recent C++ compiler to build.

Perl

Works, as seen above, but I used the Strawberry Perl as the babun version was getting linker errors.

(The truth is, I probably have too many cygwin-based tools all on my path, and the poor things are getting confused and intertwined in an unpleasant way!)

bash

I use the bash in a babun distribution and it worked well. It's even tolerant of windows-style file names being passed to it. However, it's not tolerant of the full-stop that the RNotebook environment adds to the end of its file names ☹

So, getting bash to work on Windows take a little bit of trickery, especially since the style of invocation is different between the RNotebook environment and the knitr processor.

bash for RNotenbooks - bash.bat

The following batch file proved to be the key for me:

@rem bash.bat - a Windows batch file for invoking bash from RNotebooks
@echo off
SET BASH_PATH=C:\Users\Derek Slone-Zhen\.babun\cygwin\bin
PATH=%BASH_PATH%;%PATH%
"%BASH_PATH%\bash.exe" < "%~1"

bash for knitr - engine.path

Together with an explicit engine.path for bash blocks to be used by knitr.
(RNotebook processing engine appears to ignore theses.)

knitr::opts_chunk$set(engine.path = list(
  bash = 'C:/Users/Derek Slone-Zhen/.babun/cygwin/bin/bash.exe'
))

cmd

A code engine for Windows users!

# Based on knitr::knit_engines$get('bash')
win_cmd <- function (options) 
{
  cmd = 'C:/WINDOWS/system32/cmd.exe'
  out = if (options$eval) {
    message("running: ", cmd, " ", options$code)
    tmp_file <- tempfile(fileext = ".bat")
    writeLines(options$code, tmp_file)
    opts <- paste(options$engine.opts, "/c", tmp_file) 
    tryCatch(system2(cmd, opts, stdout = TRUE, stderr = TRUE, env = options$engine.env),
             error = function(e) { if (!options$error) stop(e)
               paste("Error in running command", cmd, code)
             },
             finally = { file.remove(tmp_file) })
  } else ""
  if (!options$error && !is.null(attr(out, "status"))) 
    stop(paste(out, collapse = "\n"))
  knitr::engine_output(options, options$code, out)
}
knitr::knit_engines$set(cmd=win_cmd)

dot (Graphviz)

Graphviz is an awesome tool for auto-magically laying out graphs.

You've seen two already in this presentation.

The box th the right shows the “source code” for the first graph.

digraph {

    { rank="same"
      ordering= "out"
      "RMarkdown\n'engines'" [shape="invhouse", fontsize="12"]
      "RMarkdown" [shape="note"]
      "RMarkdown\n'engines'" -> RMarkdown [style="dotted", minlen="2" ]
    }

    knitr [shape="invhouse"]
    pandoc [shape="invhouse"]
    RMarkdown [shape="note"]
    Markdown [shape="note"]
    RMarkdown -> knitr -> Markdown -> pandoc
    "RMarkdown\n'engines'" -> knitr

    RMarkdown -> {
      "Book-\ndown" [shape="folder"]
      "Blog-\ndown" [shape="folder"]
      "RNotebooks" [shape="note"]
      "RPres"  [shape="note"]
    } [dir="back", arrowtail="odiamond"]

  pandoc ->  {
    node [shape="note"]
      "PDF"
      "Word"
      "HTML"    
  }  
}

Beyond RMarkdown

R Presentations

Multi-part Documents

Bookdown

A system for authoring complete books, too large for a single document

Blogdown

Go on, take a guess!

After much searching around, I found blogdown by Yihui Xie, the man who created Knitr. Blogdown uses Hugo, a blogging framework for static pages built using the Go language. One word of warning, blogdown is still under development, but I’ve tested it out and there have been minor issues, but most of it is because there is no documentation right now if you’re stuck.

[http://kevinfw.com/post/blogging-with-r-markdown/]

RNotebooks

Why I'm a raving fan

  • Interactive workbench for rapid prototyping
  • Reproducible research
  • Amenable to source control (git is the integrated into RSudio)
  • Languange “neutrality”
    • pick the right / best / easiest / one-you-know langauge for the job
  • Alows for embedded sql too
  • Can also be knitted to HTML, PDF or Word
    • (PDF requires a LaTeX installation, typically MikTeX on windows.)

Demo

If time permits...

brew

brew is a templateing engine for R, much like PHP or T4 templates.

Lets just dive in and take a look …

plot of chunk unnamed-chunk-9