Using devtools for Lazy People

Barry Rowlingson

Developing packages for R (they're not libraries) has in the past been tiresome, requiring a loop of edit/build/start R/install/test until sick. The devtools package not only makes building packages easy but is also useful for managing all your R code in a neat and tidy way.

TL;DR - or, for very lazy people

Make a folder called MyCode with a DESCRIPTION file. Make a subfolder called R. Put R code in .R files in there. Edit, load_all("MyCode"), use the functions.

Introduction

Very few people are taught good practice with R. They are shown the command line, run a few functions, then start editing files and sourcing them in. Some people write complete 'scripts' containing all their code in one big chunk, or they divide them up into separate functions and use source to load them in. Or they are constantly cutting and pasting chunks from files in editors, or worse still, MS Word.

Eventually the user has several source code files, with assorted dependencies, needs to source them in a particular order whenever something has changed, eventually ending up with one master file that just sources everything else:

# master file
source("somefunctions.R")
source("anotherThing.R")
source("datafunctions.R")
data = readMyData("file.dat")
analysis = analyse(data)

Ideally all your code would be in a package, and you could edit and use the code in that package easily. Some ad-hoc methods exist for managing code in a folder, and loading it into a position on the search path so that it doesn't appear in ls() listings. These methods are a bit hacky. Luckily devtools can help us, and in a way that is a stepping stone to building real packages. It also works with compiled code. There is very little extra overhead required.

Devtools

First create a folder for your code, below your working directory, and create a folder in there called R. The following snippet should work in any operating system, or just use your operating systems GUI or command line to make it:

dir.create("MyCode")
dir.create(file.path("MyCode", "R"))

Now we can use devtools to attach this in a way similar to a package:

library(devtools)
load_all("MyCode")
## Error: No DESCRIPTION file found in
## /home/nobackup/rowlings/Downloads/Devtools/MyCode

This fails because we don't have a DESCRIPTION file. Let's create one:

write.dcf(list(Package = "MyCode", Title = "My Code for this project", Description = "To tackle this problem", 
    Version = "0.0", License = "For my eyes only", Author = "Barry Rowlingson <bsr@example.com>", 
    Maintainer = "Barry Rowlingson <bsr@example.com>"), file = file.path("MyCode", 
    "DESCRIPTION"))

load_all("MyCode")
## Loading MyCode

All those fields are required. You can also create this file in an editor.

So what have we done now? Let's see what's on the search path:

search()
##  [1] ".GlobalEnv"        "package:MyCode"    "package:devtools" 
##  [4] "package:knitr"     "package:stats"     "package:graphics" 
##  [7] "package:grDevices" "package:utils"     "package:datasets" 
## [10] "package:methods"   "Autoloads"         "package:base"

You should see that your MyCode now appears as a package on the search path. But we haven't written any code yet, so there's nothing in it.

All your R code goes into the R folder. I'll create two little functions in two files. Use your favourite editor to create them if you want.

cat("foo=function(x){x*2}", file = file.path("MyCode", "R", "foo.R"))
cat("bar=function(x){x/2}", file = file.path("MyCode", "R", "bar.R"))
load_all("MyCode")
## Loading MyCode
foo(99)
## [1] 198
bar(123)
## [1] 61.5

Now the great thing here is that the load_all call noticed the two new files, and loaded them into the attached package. If I change one of them, then load_all will only load that one.

cat("foo=function(x){x*3}", file = file.path("MyCode", "R", "foo.R"))
load_all("MyCode")
## Loading MyCode
foo(99)
## [1] 297
bar(123)
## [1] 61.5

If you don't believe me you can put print functions in the R files outside of the function definitions to see what is going on.

So your development routine is simply edit/load_all/test. There's no build step.

Compiled code

devtools also works with compiled C and Fortran code. These things live in a src folder. Here I'll create a simple C file with one function that adds two numbers, returning the result in a third. Typically you'd create this in your favourite editor, I'll just make it at the R command line:

dir.create(file.path("MyCode", "src"))
cat("void cfoo(double *a, double *b, double *c){*c=*a+*b;}\n", file = file.path("MyCode", 
    "src", "cfoo.c"))
load_all("MyCode")
## Loading MyCode
## /usr/lib/R/bin/R --vanilla CMD SHLIB -o MyCode.so cfoo.c
## 

you can see that the code has been compiled. Let's write a little function to call it:

cat("cfoo=function(a,b){.C('cfoo',as.double(a),as.double(b),c=as.double(0))$c}", 
    file = file.path("MyCode", "R", "cfoo.R"))
load_all("MyCode")
## Loading MyCode
## /usr/lib/R/bin/R --vanilla CMD SHLIB -o MyCode.so cfoo.c
## 
cfoo(10, 23)
## Error: C symbol name "cfoo" not in load table

This fails. We need one more bit of boilerplate, in this case a NAMESPACE file with one directive in it:

cat("useDynLib(MyCode)\n", file = file.path("MyCode", "NAMESPACE"))
load_all("MyCode")
## Loading MyCode
## /usr/lib/R/bin/R --vanilla CMD SHLIB -o MyCode.so cfoo.c
## 
cfoo(10, 23)
## [1] 33

The beauty now is that load_all("MyCode") will load any modified R files, and recompile and reload any modified C or Fortran files.

Summary

I can't see any reason why you shouldn't use devtools this way for your day-to-day code management. There's no massive overhead, editing and reloading is simple and bombproof, and it works with R and compiled code.

The layout is simple: you need two folders, R and src in your package folder, and two files, DESCRIPTION and NAMESPACE. You don't even need half those things if you aren't using compiled code.

You can even work with multiple directories simultaneously. Perhaps you are working on a new method for fitting a statistical model to some new data. Typically you'd be writing code to mess with the new data, and model code that is generally applicable. Create two folders, MyNewModel and DataMunging, say, and separate the concerns. That way, when someone else wants to use your model code on their data you just send them one package. You just have to remember to load_all("MyNewModel") or load_all("DataMunging") depending on what you've edited.

Further

Now you've created an R package! If you want to send all your R, C, and Fortran code to someone you can build it into a source tarball very easily:

build("MyCode")
## /usr/lib/R/bin/R --vanilla CMD build \
## '/home/nobackup/rowlings/Downloads/Devtools/MyCode' --no-manual \
## --no-resave-data
## 
## [1] "/home/nobackup/rowlings/Downloads/Devtools/MyCode_0.0.tar.gz"

Give this to someone and they can install it and be using your code. That probably means you should document your functions. That's a job for roxygen - which lets you put the documentation in the R file next to the functions it belongs to. There's a document function in devtools that uses roxygen to build all the help files.

Even better than building tarballs and mailing them to people, you should look into using a source code management system and enabling sharing of effort. For public displays of code, try GitHub. The devtools package integrates with github. For example, to get the latest version of devtools you do:

install_github("devtools")

Note that installing a package this way doesn't check and install dependencies.

Last Word

Don't be afraid. In the past working with packages has been fraught with fear of having to run R CMD CHECK all the time, having to document everything, and having to restart R when you've made even the tiniest change to a function. Use devtools and it'll all work out.

Credits

The devtools package is a Hadley Wickham production