Why should you document your research with an R package?

A research project is typically composed of three types of items: Writing, Data, and Analysis scripts (WDA). Most people keep multiple versions of all of these item types, possibily in many locations on their computer. This can render the links between your WDA fragile if not even broken (i.e.; have you ever tried replicating a specific test statistic from a previous research project?). When the links between your WDA are weak (or broken), it is almost impossible to share your data with fellow scientists, or for your future self to be able to understand what the heck you did when you worked on that project in the past.

An R package is one method keep a strong link between your WDA. Storing your writing, data and analysis in an R package allows you to store everything in one file. If you put the file online, then you (or anyone else) can access your WPA from Rtudio with one line of code.

In addition to storing your WDA, R packages allow you can to write vignettes (aka, tutorials) that provide guidelines for others (and your future self!) to understand and use your data accurately.

Install / Update R and RStudio

Before you do anything else, make sure you’ve got the latest versions of R and RStudio installed!

Look at a package!

Let’s start by looking at a research R-package in action. I created an R-package called phillips2017cognition that contains the WDaA of a fictional future study.

To install the package, run the following code. When you do, you should see some red text, ending with the line DONE

install.packages(pkgs = "https://dl.dropboxusercontent.com/u/7618380/phillips2017cognition_0.1.0.tar.gz", 
                 repos = NULL, # Tells R not to try to get the package from CRAN
                 type = "source"  # Type of package is source
                 )

Exploring the phillips2017cognition package

Let’s look at the contents of the package. Like all packages, let’s start by loading it:

library("phillips2017cognition")

Now you can access the package. You can view the main help page as follows:

help(package = "phillips2017cognition")

On this page, you can see two main links: one to the DESCRIPTION file, and another to User guides and package vignettes. Feel free to click around on these to learn what they tell you.

Overview and Getting Started

The purpose of this document is to demonstrate the basics of creating an R package for the purposes of documenting scientific research. In this document, I will take you through the following 6 basics steps of creating a package.

6 Steps to creating an R package

  1. Start a new Project / Package in RStudio
  2. Update the DESCRIPTION file
  3. Save and document data
  4. Write Vignettes
  5. Write and document functions
  6. Document and build your package!

Install these packages!

Once you’ve installed R and RStudio, make sure you have the latest versions of the following packages installed by running the following:

install.packages("knitr")
install.packages("rmarkdown")
install.packages("devtools")
install.packages("rmdformats")

Step 1: Start a new package / project

Now you’re ready to get started on your package. You’ll start by opening a new project in RStudio – this project will essentially be your new package. To create your new project (aka package), do the following steps:

  1. Create a new project in R Studio (File – New Project – New Directory – R Package)
    • Give your project a name (I’ll call mine phillips2017cognition) and associate it with a directory on your computer.
    • Open the project.
  2. RStudio created a new folder on your computer with the project name. Navigate to the new folder you created and add the following folders:
    • /data (This is where you will store all of your .RData files)
    • /R (This folder only contains .R files. This is where you will store all of your documentation files, function files, and miscellaneous R code)
    • /inst (This folder contains any miscellaneous files you want to include in your package (e.g.; pdfs, images))
    • /man (This folder contains compiled documentation files generated by Roxygen (e.g.; when you run devtools::document()). You should never edit files here manually))

An R project is simply a directory with specific subfolders and a DESCRIPTION file. Here is how your package folder should look (my package is called phillips2017cognition)

Step 2: The DESCRIPTION file

Click here for a longer Guide to DESCRIPTION files

Next you’ll update the DESCRIPTION file for your package. Every R package must have a DESCRIPTION file that contains basic information about your package (e.g.; title, author, description etc.). You should update these by hand. Here are some of the main arguments:

  • Package: The name of your package. Don’t change this.
  • Title: A short (one sentence) description of your package.
  • Description: A longer (1 paragraph) description of your package and what it does.
  • Imports: The names of any other packages (separated by commas) that your package requires to work. For example, if your package includes functions from the BayesFactor package, you should include this here. If your package is stored on GitHub or CRAN, R will automatically install these packages on the user’s computer when they install your package.

There are additional fields you can add (like URL (to include websites))

Here is a simple DESCRIPTION FILE

Step 3: Save and document data

Longer guide to saving and documenting data

Ok! Now it’s time to include data in your package. You’re probably used to storing data as Excel, SPSS, or .txt files. However, in creating packages, we’ll store all of our data as R objects (e.g.; dataframes, matrices, lists, vectors) .RData files. .RData files efficiently store lots of R objects with minimal space.

Saving data as .RData files

To save data as .RData files, you first need to load the data into your working environment (usually from a .txt file), then save the data (using the save() function) as an .RData file in the data folder in your package.

  1. Import all the data files you want and assign them to new objects into your environment with a standard data reading function like read.table(). Here, I’ll read data from a text file called priming-5.txt as a new object called priming.s1.
# Read in the a text file with data from Study 1 of a priming study
priming.s1 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/04/priming-5.txt", 
    sep = "\t", header = T)

You can load as many data objects as you want.

  1. Save the data object x as an .RData file in the data/ folder of your package using save(OBJECT, file = “data/OBJECT.RData”)
# Save the data object as an .RData file in the data folder
save(priming.s1, file = "data/priming_s1.RData")

After you do this, you should see a new file called priming_s1.RData in the data folder in your package.

Putting many objects into one .RData file

You don’t need to restrict yourself to one object for each .RData file. You can store as many objects in an .RData file as you’d like. In the following code chunk, I’ll load 4 data objects (2 datasets as dataframes and 2 simulations as vectors) as new objects, then save all four objects to a single .RData file called priming_all.RData

# Store multiple objects in one .RData file

# Study 1 priming data
priming.s1 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/04/priming-5.txt", 
    sep = "\t", header = T)

# Study 2 priming data
priming.s2 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/04/priming-4.txt", 
    sep = "\t", header = T)

# simulation.A is 1000 p-values from 1000 experiments where H0 is TRUE (i.e.
# mu = 0)
simulation.A <- sapply(1:1000, FUN = function(x) {
    t.test(x = rnorm(100, mean = 0, sd = 1))$p.value
})

# simulation.B is 1000 p-values from 1000 experiments where H0 is FALSE
# (i.e. mu = 1)
simulation.B <- sapply(1:1000, FUN = function(x) {
    t.test(x = rnorm(100, mean = 1, sd = 1))$p.value
})

# Save Study 1 data, Study 2 data, and simulations A and B in
# priming_all.RData

save(priming.s1, priming.s2, simulation.A, simulation.B, file = "data/priming_all.RData")

Documenting data

Next we will create documentation files for each of the data objects (usually dataframes) you’ve saved in one (or more) .RData files. Documenting files are used to generate help menus for objects – which will tell the user what the data objects mean and how to use them.

For example, to see the help menu for the ChickWeight dataset (a dataset about chicken weights stored in R.), run the following:

help(ChickWeight)

You should create a documentation file for each important data object (usually dataframes) stored in your .RData files.

Data documentation files follow a strict format in R. They are comprised of a title, description, and a few special arguments. However, rather than describing them in detail, just look at the example below:

  1. Create a new black .R file called OBJECT_doc.R (or download the following template Template link) , where OBJECT is the name of your data object – e.g.; priming.s1. Save the OBJECT_doc.R file to the R folder in your package. Now edit your file to following the following format:
#' Study 1 of Phillips et al. (2017)
#'
#' This dataset contains fictional data from study 1 of a priming experiment.
#'
#'
#' @format A data frame containing 1000 rows and 6 columns.
#' \describe{
#'   \item{sex}{(string) - The participant's age}
#'   \item{age}{(numeric) - Participant's sex}
#'   \item{prime}{(string) - The participant's priming condition (elderly or neutral)}
#'   \item{time}{(numeric) - Amount of time (in seconds) it took for a participant to walk down the hall].}
#'  }
#' @details Each row corresponds to a participant. Participants were randomly assigned....
#' @source Experiment conducted at the Economic Psychology lab from February 2016 to April 2016
#' @examples
#'
#' # Histogram of participant ages
#' hist(priming.s1$age)
#'
#'  # t.test comparing the walking time of participants in each condition.
#'  #   should give a p-value of .03
#'
#'  t.test(formula = time ~ prime,
#'           data = priming.s1)

"priming.s1"

Here are some notes on creating this documentation file:

  • You’ll notice that the comments are with #‘instead of the standard #, this type of comments is specific for package documentation. If you want to include a ’real’ comment, do so with the standard # AFTER the #’

  • Include the name of the object at the end of the file in quotation marks (without the #’ commenting)

  • The examples at the bottom should all be executable R code. However, you don’t need to include examples if you don’t want to.

  1. Repeat step 1 for every important data object in your package that you want to fully document. That is, every important data object should have its own _doc.R file in your R folder.

Step 4: Vignettes

Longer guide to writing vignettes

Now that you’ve documented your dataset, it’s time to create longer-form vignettes. A vignette is a long-form document containing a mixture of text, R code, and R output. You can create vignettes that replicate analyses, generate plots…anything you want.

Vignettes are written in R Markdown. R Markdown is a base-bones markup language for writing text. For a thorough overview on how to use R Markdown, check out full R Markdown Guide. For a easy-to-use reference guide on R Markdown, check out R Markdown Quick Reference.

Here are the two steps to creating a vignette:

  1. Create a new BLANK.Rmd file using the devtools::use_vignette(“BLANK”) function (where BLANK is the name of your vignette). Here, I’ll create a new vignette called study1 which replicates the main analyses in study 1 of my paper.
# Create a new vignette template
devtools::use_vignette("study1")

Running this will create a new folder called /vignette (if it does not already exist), and create a new file called study1.Rmd.

  1. Open your newly created vignette file (e.g.; study1.Rmd) and edit it in R Markdown. To see an example file, check out Example vignette

Vignette tips

  1. You need to be careful when you use functions from another package in your vignette. In most R code, you’ll first load the package library with library() (e.g.; library(‘BayesFactor’)). However, when writing code in a package, you should NOT explicitly load any library with the library() function. Instead, use the PACKAGE::FUN() notation, where PACKAGE is the name of the package and FUN is the name of the function. For example, to impliment a Bayesian T-test using the ttestBF() function from the BayesFactor package in my vignette I could write:
# Some random data
x <- c(-0.64, 0.02, 1.27, -0.01, -0.6, -0.55, -1.5, -1.88, -0.58, -0.21, 0.07, 
    1.23, -0.02, -0.28, -0.76, 0.39, 0.96, 1.54, 0.67, 1.66)

# Conduct a Bayesian 1-sample t-test on x
x.ttestBF <- BayesFactor::ttestBF(x)

Step 5: Write and document functions

Longer guide to defining and documenting functions

Most packages will include functions. If the purpose of your package is to document a study, you can include functions that will display vignettes, replicate specific analyses, create plots…pretty much anything you want!

Functions are stored in separate NAME_function.R files in your /R folder. Here’s how to create them

  1. Create a new .R document called NAME_function.R

  2. Define and document your function using documentation notation. This documentation notation specifies the parameters and examples which the user can then view using the help menu (e.g.; ?functionname).

Here is an example of a function called myhist:

#' A customized histogram function
#'
#' This function takes a few arguments and returns a histogram
#'
#' @param x a vector of data
#' @param include.ci A logical value indicating whether or not to include a confidence interval
#' @param ... Other arguments passed on to hist
#' @export
#' @examples
#'
#'  myhist(priming.s1$time)
#'
#'

myhist <- function(x, include.ci = T, ...) {
    
    hist(x, yaxt = "n", bty = "n", border = "white", col = "gray", ylab = "", 
        ...)
    
    if (include.ci) {
        
        ci <- t.test(x)$conf.int
        
        segments(ci[1], 0, ci[2], 0, lwd = 5)
        
    }
    
}

Here is a function called study1 which simply opens up a vignette for study 1

#' study1 function
#'
#' This function opens the vignette for study 1. Just run study1() to see the results.
#'
#' @export
#' @examples
#'
#' study1()
#' 

study1 <- function() {
    
    vignette("study1", package = "example")
    
}

Tips

  1. If you use functions from external packages (e.g.; BayesFactor, dplyr) in your R code, you need to make sure those packages are installed on the user’s computer so all your code will work. You can do this in two ways:

    • If you submit your package to CRAN (How to submit a package to CRAN), or host it on GitHub (How to put a package on GitHub), you simply need to specify the additional packages in your DESCRIPTION file (CLick Me for more tips on DESCRIPTION files). When you do this, R will automatically install all necessary packages on the user’s machine when (s)he installs your package using install.packages(“yourpackage”) (from CRAN), or install_github(“yourname/yourpackage”) from GitHub.

    • If you do not host your package online, you can write a simple function (i.e.; get.packages()) that makes sure all of the necessary packages are installed on the user’s machine. To see an example, click here. Make sure to save the documentaion file in the format getpackages_.R in your /R folder.

#' Get packages necessary for some analyses
#'
#' This function checks if you have all the packages necessary for the analyses in this paper. If you don't, they will be installed.
#'
#' @export
#' @examples
#'
#'  get.packages()
#'

get.packages <- function() {
    
    existing.packages <- installed.packages()[, 1]
    
    if (("BayesFactor" %in% existing.packages) == F) {
        install.packages("BayesFactor")
    }
    if (("dplyr" %in% existing.packages) == F) {
        install.packages("dplyr")
    }
    
    print("Ok you're got all the packages you need for phillips2017cognition! You never need to run this again (on this computer).")
}

Step 6: Document and build

Now you’re ready to put all the elements of your package together! We’ll do this in two steps. First, we’ll run the devtools::document() function. This function will convert all of your documentation files into more technical formats that R understands better. Second, we’ll uset the devtools::build() function to compile all of your package code into a single file.

Create documentation files with document()

  1. Execute the document() function (from the devtools package) to create an x.Rd file from your x_doc.R in your /man folder.
# If devtools is not installed, install it first with
# install.packages('devtools')

# Create an OBJECT.Rd file
devtools::document()

Build your package with build()

  1. Build your pckage in to a single R package file (.tar.gz) file with the devtools::build() function:
# Build my package as a .tar.gz file
devtools::build()

Once you run this code, you should see a new .tar.gz file in your project. This file contains your final package!

Step A: Write your paper in Sweave!

If you want to have an unbreakable WDA link, you should write your paper in Sweave. Sweave is a combination of R and LaTeX. By using Sweave, you can write your entire paper (in APA or any other format) and include all of your analyses into one Sweave (.Rnw) document.

It is beyond the scope of this tutorial to explain LaTeX and Sweave from scratch. Rather than giving you a full tutorial, we’ll go over a few examples of Sweave documents in APA style.

First, make sure you’ve installed the latest version of the following packages. The first two packages (knitr and xtable) are general purpose packages that help you create Sweave documents. The second two packages (dplyr and BayesFactor), are specific to some of the R functions I’ll use in an example

# These packages help with Sweave
install.packages("knitr")
install.packages("xtable")

# These packages will help with specific examples
install.packages("dplyr")
install.packages("BayesFactor")

Next, you’ll need to change some of the the RStudio Sweave preferences. Go to RStudio – Preferences – Sweave and make the following two changes:

  • Select “Sweave Rnw files using Knitr
  • Set “Preview PDF” with “System Viewer”

The apasweave document

Ok we’re ready to chack out a Sweave document! To start a new Sweave file in RStudio, go to File – New File – R Sweave. When you do this, a new document will open with a few LaTeX commands (like \begin{document}. Save the Sweave (.Rnw) document under a new name (e.g.; apasweave.Rnw).

Now we need to replace this with Sweave code that will create an APA style document. It’s best to start with an example. I’ve created an example Sweave document in the following link (Simple APA Sweave File). Open the link and copy and paste all the text into your Sweave document. Then create a pdf from the document by clicking the “Compile PDF” button above the document.

Installing an existing package

Great! You’ve successfully created a new R package! However, to access it, you need to install the package. To do this, just use the install.packages() command and put the file location as the main argument. You’ll also include two extra arguments that tell R that you are not installing the package from CRAN.

install.packages(pkgs = "/Users/nphillips/Dropbox/Temp/example_0.1.0.tar.gz", 
                 repos = NULL, # Tells R not to try to get the package from CRAN
                 type = "source"  # Type of package is source
                 )

To install a package from the web, just include the weblink as the main argument (pkgs). For example, here is a public link to my package on dropbox:

install.packages(pkgs = "https://www.dropbox.com/s/mhb7fd1r0ohx5cs/example_0.1.0.tar.gz?dl=0", 
                 repos = NULL, # Tells R not to try to get the package from CRAN
                 type = "source"  # Type of package is source
                 )