R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Introduction to ‘packrat’ package

Packrat is a dependency management system for R

R package dependencies can be frustrating. Have you ever had to use trial-and-error to figure out what R packages you need to install to make someone else’s code work-and then been left with those packages globally installed forever, because now you’re not sure whether you need them? Have you ever updated a package to get code in one of your projects to work, only to find that the updated package makes code in another project stop working?

Motivation: Reproducible Research

“The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified.” CRAN Task View on Reproducible Research

Configuration Rot

Freezing is the answer, but what to freeze?

Freezing CRAN solve only a subset of the problem, and introduces its own problems.No changes to CRAN are required to provide a highly robust system of R package dependency management The only complete answer to this problem is freezing projects.

Individual projects should be able to freeze arbitrary combinations of R packages with a guarantee of being able to use them in the future.

Packrat as a Possible Solution

Packrat is an R package that implements a dependency management system for R:

Basic concepts

rstudio/packrat

If you’re like the vast majority of R users, when you start working on a new R project you create a new directory for all of your R scripts and data files.

Packrat enhances your project directory by storing your package dependencies inside it, rather than relying on your personal R library that is shared across all of your other R sessions. We call this directory your private package library (or just private library). When you start an R session in a packrat project directory, R will only look for packages in your private library; and anytime you install or remove a package, those changes will be made to your private library.

Unfortunately, private libraries don’t travel well; like all R libraries, their contents are compiled for your specific machine architecture, operating system, and R version. Packrat lets you snapshot the state of your private library, which saves to your project directory whatever information packrat needs to be able to recreate that same private library on another machine. The process of installing packages to a private library from a snapshot is called restoring.

Installing packrat

Packrat is now available on CRAN, so you can install it with:

install.packages("packrat")

packrat walkthrough guide

This tutorial will walk you through some of the most common tasks you’ll want to do with packrat, and explain the fundamental concepts behind the package on the way.

First things first

You’re getting ready to start a new project, so you create a new directory that will eventually contain all the .R scripts, CSV data, and other files that are needed for this particular project.

You know you’re going to need to make use of several R packages over the course of this project. So before you write your first line of code, set up the project directory to use Packrat with packrat::init:

packrat::init("testproject") # current working directory is the project directory, hence omitted the path
## Initializing packrat project in directory:
## - "J:/Infy _Analytics projects/packrat - Copy/rfiles/testproject"
## 
## Adding these packages to packrat:
##             _        
##     packrat   0.4.8-1
## Fetching sources for packrat (0.4.8-1) ...
## OK (CRAN current)
## Snapshot written to "J:/Infy _Analytics projects/packrat - Copy/rfiles/testproject/packrat/packrat.lock"
## Installing packrat (0.4.8-1) ...
##  OK (downloaded binary)
## Initialization complete!

After initializing the project, you will be placed into packrat mode in the project directory. You’re ready to go!

You’re no longer in an ordinary R project; you’re in a Packrat project. The main difference is that a packrat project has its own private package library. Any packages you install from inside a packrat project are only available to that project; and packages you install outside of the project are not available to the project.

This is what we mean by “isolation” and it’s a Very Good Thing, as it means that upgrading a package for one project won’t break a totally different project that just happens to reside on the same machine, even if that package contained incompatible changes.

A packrat project contains a few extra files and directories. The init() function creates these files for you, if they don’t already exist.

*see the screenshot of the files created in my project directory

Adding, removing, and updating packages

Adding a package in a Packrat project is easy. The first step is to start R inside your Packrat project, and install the package however you normally do; usually that means either the install.packages() function or the “Install Packages” button in your favorite R IDE. Let’s do this now, with the reshape2 package.

install.packages("reshape2")
## Installing package into 'J:/Infy _Analytics projects/packrat - Copy/rfiles/testproject/packrat/lib/x86_64-w64-mingw32/3.3.1'
## (as 'lib' is unspecified)
## also installing the dependencies 'stringi', 'magrittr', 'plyr', 'stringr', 'Rcpp'
## package 'stringi' successfully unpacked and MD5 sums checked
## package 'magrittr' successfully unpacked and MD5 sums checked
## package 'plyr' successfully unpacked and MD5 sums checked
## package 'stringr' successfully unpacked and MD5 sums checked
## package 'Rcpp' successfully unpacked and MD5 sums checked
## package 'reshape2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\ADMIN\AppData\Local\Temp\RtmpofwLmD\downloaded_packages

If you completed the previous steps correctly, you just installed the reshape2 package from CRAN into your project’s private package library. Let’s take a snapshot to save the changes in Packrat:

packrat::snapshot()
## 
## Adding these packages to packrat:
##              _       
##     Rcpp       0.12.7
##     magrittr   1.5   
##     plyr       1.8.4 
##     reshape2   1.4.2 
##     stringi    1.1.2 
##     stringr    1.1.0
## Fetching sources for Rcpp (0.12.7) ...
## OK (CRAN current)
## Fetching sources for magrittr (1.5) ...
## OK (CRAN current)
## Fetching sources for plyr (1.8.4) ...
## OK (CRAN current)
## Fetching sources for reshape2 (1.4.2) ...
## OK (CRAN current)
## Fetching sources for stringi (1.1.2) ...
## OK (CRAN current)
## Fetching sources for stringr (1.1.0) ...
## OK (CRAN current)
## Snapshot written to "J:/Infy _Analytics projects/packrat - Copy/rfiles/testproject/packrat/packrat.lock"

If you have automatic snapshots turned on, Packrat will record package upgrades and additions in the background, so you don’t even need to remember to call ::snapshot() manually unless you’re performing a less common action.

When packrat takes a snapshot, it looks in the project’s private package library for packages that have been added, modified, or removed since the last time snapshot was called. For packages that were added or modified, packrat attempts to go find the uncompiled source package from CRAN, BioConductor, or GitHub (caveat: only for packages that were installed using devtools version 1.4 or later), and save them in the packrat/src project subdirectory. It also records metadata about each package in the packrat.lock file.

Because we save source packages for all of your dependencies, packrat makes your project more reproducible. When someone else wants to run your project-even if that someone else is you, years in the future, dusting off some old backups-they won’t need to try to figure out what versions of what packages you were running, and where you got them.

Installing local source packages

You may be working on a project with an R package that is not available on any external repository. Don’t fret; packrat can still handle this! With source packages, we expect these packages live in a local repository. A local repository is just a directory containing package sources. This can be set within a packrat project with:

packrat::set_opts(local.repos = "<path_to_repo>")

Restoring snapshots

Once your project has a snapshot, you can easily install the packages from that snapshot into your private library at any time.

You’ll need to do this, for example, when copying the project to a new computer, especially to one with a different operating system. Let’s simulate this by exiting R and then deleting the library subdirectory in your project. Then launch R from your project directory again.

Packrat automates the whole process for you - upon restarting R in this directory, you should see the following output:

All of the packages in the snapshot have now been installed in your project’s newly created private package library.

packrat::status()
## Up to date.

Another reason to restore from the packrat snapshot is if you remove a package that you later realize you still needed, or if one of your collaborators makes their own changes to the snapshot. In these cases, you can call packrat::restore().

Let’s remove the plyr package, and use packrat::restore() to bring it back.

remove.packages("plyr")
## Removing package from 'J:/Infy _Analytics projects/packrat - Copy/rfiles/testproject/packrat/lib/x86_64-w64-mingw32/3.3.1'
## (as 'lib' is unspecified)
packrat::status()
## 
## The following packages are tracked by packrat, but are no longer available in the local library nor present in your code:
##          _      
##     plyr   1.8.4
## 
## You can call packrat::snapshot() to remove these packages from the lockfile, or if you intend to use these packages, use packrat::restore() to restore them to your private library.
packrat::restore()
## Installing plyr (1.8.4) ...
##  OK (downloaded binary)

Cleaning up

Package libraries can grow over time to include many packages that were needed at one time but are no longer used. Packrat can analyze your code and try to determine which packages you’re using so you can keep your library tidy. Let’s take a look at ::status() again:

packrat::status() The following packages are installed but not needed:

plyr       1.8.1 
Rcpp       0.11.2
reshape2   1.4   
stringr    0.6.2 

Use packrat::clean() to remove them. Or, if they are actually needed by your project, add library(packagename) calls to a .R file somewhere in your project.

Notice these packages are “installed but not needed”. Packrat attempts to ascertain which packages your project needs by analyzing the *.R script files in your project and looking for calls like library() and require().

Using Packrat with RStudio

The latest version of RStudio includes built-in support for Packrat. To use it, you’ll need RStudio 0.98.945 or newer, and the very latest version of Packrat from GitHub.

Packrat projects

Like Packrat projects, RStudio projects are designed around a workflow in which your code and its resources are housed in a directory on disk.

In a Packrat project, the Packrat “application directory” (i.e. the host for /packrat/) and RStudio “project directory” (i.e. the folder that contains the .Rproj file) are the same folder, so Packrat and RStudio both have the same understanding of a project’s contents and can work together to manage its library.

If you’re bringing an existing Packrat project into RStudio, you don’t need to do anything special to make it work. And if you’re just starting a new project, you can start with Packrat right away:

The new project will come with its own private library, just as though you’d created a new directory and called packrat::init() on it.

If you have an existing RStudio project you’d like to bring under Packrat control, you can add it using the new Packrat section under Tools | Project Options.

The Packages pane

While you’ve got a Packrat project open, the Packages pane will show you the status of your project’s private Packrat library, including the stored Packrat version of each package.

You can check the Packages pane any time to see what packages are available in your private library or check for differences between your private library and what Packrat knows about. RStudio will raise the Packages pane if there’s any action you need to take.

Making changes

When you start a new project, one of the first things you’ll want to do is add a few packages to the Packrat library. You’re free to use any tools you normally use to install packages. Let’s try packrat::install_github("rstudio/rmarkdown"). Once the command finishes you’ll see the Packages pane show the new packages:

You’ll notice that when you first install a pacakge, the Packrat column is blank: this is because the package isn’t stored in Packrat yet. RStudio works behind the scenes to fetch the package’s sources and save them in Packrat. This feature is called auto-snapshotting. You can disable it in Packrat options if you’d rather always save library changes manually.

When it’s done, the pane will show that Packrat has been updated with your new packages:

Package downgrades and removal aren’t done automatically for safety reasons. Try installing the cluster package; once Packrat knows about it, remove it. You’ll be prompted to save the destructive change to Packrat manually if you really intended to make it:

Collaborating

If you’re collaborating using a version control system, Packrat will help keep your private libraries in sync. RStudio watches for changes to your Packrat lockfile. When a change from a version control system updates your Packrat lockfile, RStudio will prompt you to apply that change to your private library.

For instance, let’s say your collaborator adds a package called argparser as part of a commit. When the packrat.lock file is updated by the version control system, RStudio will prompt you to bring your library in sync.

Resolving conflicts

When your library differs from Packrat, RStudio will try to guess the appropriate action to bring them back in sync. Notice that in the two cases we just covered, the states were identical: a package is present in Packrat, but not the library. RStudio tries to infer whether the appropriate action is a snapshot (i.e. update Packrat to match the library) or restore (i.e. update the library to match Packrat).

In some cases, however, it won’t be possible for RStudio to guess. Let’s imagine that you ignored your colleague’s changes and added a package of your own, ber. This will generate a conflict.

Click the Resolve. button:

It’s not possible to cherry-pick changes-you can pick your library state or the Packrat state, but not some of each. It is presumed that your library state and Packrat state each represent a consistent state, and cherry-picking changes from each could leave your library in an inconsistent state.

Cleaning unused packages

Package libraries can grow over time to include many packages that were needed at one time but are no longer used. You can clean these up with the Clean Unused Packages command.

This brings up a dialog that with a convenient interface over packrat::clean().

Remember, package removes aren’t auto-snapshotted, so save your changes to Packrat once you’ve verified that your project’s state is consistent after cleanup.

Packrat’s understanding of which packages you’re using is based on some relatively simple heuristics, so it might not always be able to figure out that you’re using a package. If you want to keep a package from appearing in the unused list, just add a library(packagename) call to any .R file in your project’s directory.

install.packages("devtools")
## Installing package into 'J:/Infy _Analytics projects/packrat - Copy/rfiles/testproject/packrat/lib/x86_64-w64-mingw32/3.3.1'
## (as 'lib' is unspecified)
## also installing the dependencies 'mime', 'curl', 'openssl', 'R6', 'httr', 'memoise', 'whisker', 'digest', 'rstudioapi', 'jsonlite', 'git2r', 'withr'
## package 'mime' successfully unpacked and MD5 sums checked
## package 'curl' successfully unpacked and MD5 sums checked
## package 'openssl' successfully unpacked and MD5 sums checked
## package 'R6' successfully unpacked and MD5 sums checked
## package 'httr' successfully unpacked and MD5 sums checked
## package 'memoise' successfully unpacked and MD5 sums checked
## package 'whisker' successfully unpacked and MD5 sums checked
## package 'digest' successfully unpacked and MD5 sums checked
## package 'rstudioapi' successfully unpacked and MD5 sums checked
## package 'jsonlite' successfully unpacked and MD5 sums checked
## package 'git2r' successfully unpacked and MD5 sums checked
## package 'withr' successfully unpacked and MD5 sums checked
## package 'devtools' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\ADMIN\AppData\Local\Temp\RtmpofwLmD\downloaded_packages
devtools::session_info()
## Session info --------------------------------------------------------------
##  setting  value                       
##  version  R version 3.3.1 (2016-06-21)
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_India.1252          
##  tz       Asia/Calcutta               
##  date     2016-10-26
## Packages ------------------------------------------------------------------
##  package       * version date       source        
##  assertthat      0.1     2013-12-06 CRAN (R 3.3.0)
##  BiocInstaller   1.22.2  2016-05-12 Bioconductor  
##  devtools        1.12.0  2016-06-24 CRAN (R 3.3.1)
##  digest          0.6.10  2016-08-02 CRAN (R 3.3.1)
##  evaluate        0.9     2016-04-29 CRAN (R 3.2.5)
##  formatR         1.4     2016-05-09 CRAN (R 3.3.0)
##  htmltools       0.3.5   2016-03-21 CRAN (R 3.2.4)
##  knitr           1.14    2016-08-13 CRAN (R 3.3.1)
##  magrittr        1.5     2014-11-22 CRAN (R 3.2.1)
##  memoise         1.0.0   2016-01-29 CRAN (R 3.3.1)
##  packrat         0.4.8-1 2016-09-07 CRAN (R 3.3.1)
##  Rcpp            0.12.7  2016-09-05 CRAN (R 3.3.1)
##  rmarkdown       1.1     2016-10-16 CRAN (R 3.3.1)
##  stringi         1.1.2   2016-10-01 CRAN (R 3.3.1)
##  stringr         1.1.0   2016-08-19 CRAN (R 3.3.1)
##  tibble          1.2     2016-08-26 CRAN (R 3.3.1)
##  withr           1.0.2   2016-06-20 CRAN (R 3.3.1)
##  yaml            2.1.13  2014-06-12 CRAN (R 3.2.1)