The purpose of this document is to summarize the main points from the book “Dynamic Documents with R and knitr”.

Background

When you use Sweave for the first time, you really appreciate its utility. It’s like you are given the tools to think, code, reflect, tweak the code, write down your observations about the code and output, all in one document. No distractions, No switching softwares. I had thoroughly enjoyed doing literate programming in the last few years. However you reach a point when you think Sweave package should have had some convenient features. I was always not happy with the code chunk for including graphics. If I use ggplot2 package, I had to write print so as to include the figure in the report. However if it was a base graphics, I could just include it with out any such additional statement. Also to have control on the size of figure, I had to fall back on LaTeX. Shouldn’t there be an option in the code chunk that combines these two and make it easier for the user ? That’s when I stumbled on to knitr and I have been thrilled with it. In this document, I will try to summarize the main points from the book on knitr, written by the package author, Yihui Xie

Preface

There are dangers in doing analysis at a separate place and then tabulating all the results in to a report manually.

Introduction

The basic idea behind dynamic documents stems from literate programming, a programming paradigm conceived by Donald Knuth (Knuth, 1984). The original idea was mainly for writing software: mix the source code and documentation together; we can either extract the source code out (called tangle) or execute the code to get the compiled results (called weave). A dynamic document is not entirely different from a computer program: for a dynamic document, we need to run software packages to compile our ideas (often implemented as source code) into numeric or graphical output, and insert the output into our literal writings (like documentation).

The traditional approach to doing the second task is to write comments for the code, but comments are often limited in terms of expressing the full thoughts of the authors. Normally we write our ideas in a paper or a report instead of hundreds of lines of code comments. Technically, literate programming involves three steps:

These steps can be implemented in software packages, so the authors do not need to take care of these technical details. Instead, we only control what the output should look like.

Reproducible Research

Dynamic report generation is a step towards RR, as the latter encompasses many more activities. If you had taken a particular seed for simulation, you can generate a dynamic report and be done with it. However may be the seed turned to be a lucky seed and the inferences are actually incorrect. These type of aspects cannot be ensured by merely creating a dynamic report. Hence one must carefully understand the limitations of Dynamic reports, i.e. what can be done via RR ? and what RR doesn’t do ?

A few good practices for RR:

I just got to know about (Rpubs). I hope to use in the days to come.

A First Look

The knitr package is a general-purpose literate programming engine - it supports document formats including LATEX, HTML, Markdown, and programming languages such as R, Python, awk, C\( ++ \), and shell scripts.

Two basic examples are shown in this chapter, one that converts a .Rnw in to .pdf and the second a .Rmd in to .html. Just to understand how publishing works, I have posted it on (Rpubs).If you have a R code, you can quickly convert in to a report using the stitch() command. Also given a .Rnw, you can extract the code using purl() command.

Editors

This chapter gives information about various editors that can be configured so that working with knitr becomes a pleasant experience. It mentions RStudio, LyX,ESS, Tinn-R,Texmaker, Eclipse, TextMate,TEXShop, and Vim. I am happy to know that I can configure StatET to work with knitr. Will get it installed on StatET sometime soon.

Document Formats

The three components of knitr are

The parser parses the source document and identifies computer code chunks as well as inline code from the document; the evaluator executes the code and returns results; the renderer formats the results from computing in an appropriate format, which will finally be combined with the original documentation.

The pattern for beginning of a chunk in Rnw is a regular expression

## [1] "^\\s*<<(.*)>>=.*$"

The chunk options can be any piece of R code that is compatible with the kind of value that option expects. You can write a code that results in TRUE/FALSE and then assign to it option eval. This is very flexible as compared to Sweave where you can’t use code to set option values.

Chunk labels are supposed to be unique id’s in a document, and they are mainly used to generate external files such as images and cache files. If two non-empty chunks have the same label, knitr will stop and emit an error message, because there is potential danger that the files generated from one chunk may override the other chunk. If we leave a chunk label empty, knitr will automatically generate a label of the form unnamed-chunk-i, where i is an incremental chunk number. One can also set chunk options globally, for example,opts_chunk$set (echo = FALSE), does not echo code for any chunk. You can override this setting for individual chunk though. <<>>= denotes opening of a code chunk and @ denotes close of a code chunk and opening of documentation chunk.

In a Sweave document, the start and end of code syntax is <<*>>= and and @. Inline code syntax is \Sexpr{}. In a mardown document, the start and end of code syntax is ```{r *} and and ```. Inline code syntax is `r x`. The chapter talks about markdown language and praises its simplicity that anyone who has ever written an email can create a markdown document. It also goes on to list a host of derivative markdown packages that have appeared. RStudio used markdown package for its implementation. It also mentions Pandoc, that is usually called the swiss knife of document conversion as it supports the conversion of markdown in to a host of formats. knitr will parse the code, be it a a chunk or inline fragment and then renders it to the appropriate output using various output hooks.

    

## [1] "render_asciidoc" "render_html"     "render_jekyll"   "render_latex"   
## [5] "render_listings" "render_markdown" "render_rst"      "render_sweave"

render_sweave() uses the default Sweave output whereas render_listings() decorates the output.

So, based on type of output you want, you can configure the setting. This chapter explains the set of functions for various hooks such as plot, chunk, inline, document so that the hooks produce appropriate content for the chosen output format. There is also a spin() function that takes plain simple R code that has been commented using roxygen and converts in to LaTeXor markdown document.

Text Output

knitr default rounding is 4 digits and numbers $> 10{-5}$ will be shown in scientific notation. Here are some of the main points mentioned in the context of text output.

Graphics

Yippee!,finally reached the section for which I was looking forward to read. Firstly no need of print(p) to include a ggplot(2) visual. The default device for Rnw documents is PDF and for Rmd/Rhtml/Rrst documents, it is PNG because normally PDF does not work in HTML output. Some of the options for graphics output are

Cache

This is another section of the book that I was eagerly awaiting to get to. Most of the times my Sweave documents take a long time to run, mainly because some of my code chunks take a long time to run. I knew there was a way around in Sweave to use caching but somehow never found time to learn about it. The other day I was working on a document and realized that there was no option but to cache various chunks. I used a funny work around, in fact an ugly work around of placing all code chunks in separate child Sweave files. Needless to say, organizing the whole thing was a nightmare. knitr promised a convenient way and indeed it is so easy to incorporate caching.

The basic ideas of caching is that a chunk will not be re-executed as long as it has not been modified since the last run, and old results will be directly loaded instead.

Cross Reference

This chapter begins with the idea of chunk reuse. One can reuse whole chunks by following ref option The section on organizing child documents was very useful to me. The inclusion of child Rnw documents can be done by

<<D, child="chapt1.Rnw">=
@

Learnt a way to apply preamble to the child documents

<<parent, include=FALSE>=
set_parent ( "master.Rnw" )
@

All these aspects were very messy via Sweave. Indeed knitr makes things in RR very appealing.

Hooks

A hook is a userdefined R function to fulfill tasks beyond the default capability of knitr. This chapter deals with chunk hooks. A chunk hook is a function stored in knit_hooks and triggered by a custom chunk option.

Out of the all examples mentioned in this chapter, I think the most useful one for my work is cropping of a figure via chunk hook.

Language Engines

This chapter talks about using knitr with other languages such as Python, Ruby, Haskell, awk/gawk, sed, shell scripts, Perl, SAS, TikZ, Graphviz and $C++$, etc. Not relevant for my future work and hence skipped the contents.

Tricks and Solutions

The author says in his blog :

I do not have much to say about this book: almost everything in the book can be found in the online documentation, questions & answers and the source code. The point of buying this book is perhaps you do not have time to read through all the two thousand questions and answers online, and I did that for you.

This chapter is mainly a collection of important tricks that you can use for RR. Here are the some of tricks I will definitely use in my RR activity(the actual list in the book is long and the usage depends on purpose of RR one is creating):

Publishing Reports

When your RR is ready for publication, it is always better to set message and warning to FALSE. Using RStudio to produce pdf of html is via a simple click of a button. One can also use Pandoc to convert a markdown document to LaTeX, HTML, rtf, epub, word doc, open document text etc. Basically pandoc is called swiss knife of document conversion and hence its always better to spend some time learning the basic aspects of it. The easiest way to create a HTML5 presentation is to create a markdown via RStudio and then run pandoc. This will be useful for those who want to embed code and text in to the presentation. These days it has become quite common for people to blog R code and the chapter presents two ways to do it. One is via Jekyll and other via the usual wordpress route. The infra needed to push documents to either of these places is provided in knitr. The thing with wordpress is that modification is a pain once you have published. With Jekyl it is said that it is far easier to maintain. But tell me who has time to edit a blog post? Publishing itself takes time and I would rather do it on wordpress and be done with it. Well, may be if you want users to check in and correct your code, probably Jekyl is a good way to go. I have never hosted code on Jekyl. Will try it sometime soon.

Applications

This chapter mentions four applications. First is Doing Homeworks. Indeed with minimal configuration, one can turn in high quality document will all the code and narrative at one place. The second application mentioned is websites and blogs built on knitr. Third application is very interesting, creating vignettes. As things stand, these are created using Sweave and there are no HTML vignettes on CRAN. Hopefully as things move, CRAN will allow HTML vignettes that are so much more convenient for bookmarking, sharing etc. The fourth application mentioned is writing books. I think this is similar to what is happening in the IPython world where people are writing books in an ipython document and sharing it via sites like nbviewer. The chapter mentions a few sites where knitr has been used

Other Tools

This chapter talks about other tools such as Sweave, Dexy, IPython, Orgmode. Since I have been using Sweave for some time, I found this chapter extremely useful as it lists down almost all the differences between Sweave and knitr. This kind of information all at one place is so useful. I mean I might have had to go through a ton of posts on stackoverflow or other places to get this info. Let me list down some of the main differences here.

Takeaway

Imagine that you were using a clunky and a painful email service and suddenly one day you are shown gmail. Aren’t you thrilled ?. It's elegant, quick and has a ton intuitive features. I had the same feeling with knitr after having painfully used Sweave for a long time. I am certain that this package will stand out as the goto package for literate programming for a very long time to come because it is elegant, quick and has features that you were always trying to patch in via other packages. Should you read this book? Well, if you have the patience and time to go over the manual and a thousand posts from stackoverflow and other places to know the various features of the package, you don't need this book. However if you are like me who is short of time, values content that is organized and prefers to know the key hacks from the package, this book is definitely worth it.