Multi-language Machinations in [R] Markdown

Premise

A super-nice feature of R markdown files is the ability to run code chunks and use their output. BUT you are not limited to R in the code chunks. You can run a wide variety of existing languages and can add your own pretty easily.

I posited that one could use a single knitr Rmd document to incorporate all the steps for a reproducible research product across multiple languages (say, if you need/want to use perl or awk for data munging and need to use something that’s in scikit-learn that’s not in R).

This document is a small example of how to incorporate a multi-language/engine workflow into a single Rmd document.

The workflow

I don’t have a really ugly file to work with at the moment (at least that I can share) so here’s a totally made up example of grabbing a file from the web (found at random), doing some munging and then reading it into R for some work and then using (ugh) gnuplot to visualize the results.

Note that “state” is not maintained in anything but R code chunks in an Rmd so everything else relies on using files (or other techniques) to ensure that you have what’s needed between processing steps.

We’ll need some setup code since I’m using gnuplot:

library(knitrengines) # github.com/hrbrmstr/knitrengines

## Loading required package: knitr
## Registering new knitr chunk processing engines [go, elixir, pygments, gnuplot]...

Let’s fetch the data file we’ll be working on (using bash).

```{r "get data file", engine="bash"}
curl --silent --output goodreads.csv "https://www.gwern.net/docs/personal/goodreads.csv"
> goodreads_cleaned.csv
```

That file has a field with HTML tags in it, so let’s get rid of them. Note that we pass the input filename to awk in the engine.opts field.

```{r "fix data file", engine="awk", engine.opts="goodreads.csv"}
{
  gsub(/<[^>]*>/,"")
  print >> "goodreads_cleaned.csv"
}
```

Now use R to do some hardcore stats work:

```{r "process file"}
goodreads <- read.csv("goodreads_cleaned.csv", stringsAsFactors=FALSE)

ratings_count <- table(goodreads$My.Rating)

write.table(as.data.frame(ratings_count), "goodreads_ratings.dat", col.names=FALSE, sep=" ", quote=FALSE)
```

And, then use gnuplot to generate some cutting edge visualizations:

```{r "plot me", engine="gnuplot"}
set terminal png 
set output "goodreads.png"
set style fill solid
set boxwidth 0.5
plot "goodreads_ratings.dat" using 1:3:xtic(2) with boxe
```

Note that we had to use ![](goodreads.png) to get that image into the Rmd.

Multi-language Machinations in [R] Markdown

Bob Rudis (@hrbrmstr)

October 22, 2015

Premise

The workflow

Source