A research project is typically composed of three types of items: Writing, Data, and Analysis scripts (WDA). Most people keep multiple versions of all of these item types, possibily in many locations on their computer. This can render the links between your WDA fragile if not even broken (i.e.; have you ever tried replicating a specific test statistic from a previous research project?). When the links between your WDA are weak (or broken), it is almost impossible to share your data with fellow scientists, or for your future self to be able to understand what the heck you did when you worked on that project in the past.
An R package is one method keep a strong link between your WDA. Storing your writing, data and analysis in an R package allows you to store everything in one file. If you put the file online, then you (or anyone else) can access your WPA from Rtudio with one line of code.
In addition to storing your WDA, R packages allow you can to write vignettes (aka, tutorials) that provide guidelines for others (and your future self!) to understand and use your data accurately.
Before you do anything else, make sure you’ve got the latest versions of R and RStudio installed!
Let’s start by looking at a research R-package in action. I created an R-package called phillips2017cognition that contains the WDaA of a fictional future study.
To install the package, run the following code. When you do, you should see some red text, ending with the line DONE
install.packages(pkgs = "https://dl.dropboxusercontent.com/u/7618380/phillips2017cognition_0.1.0.tar.gz",
repos = NULL, # Tells R not to try to get the package from CRAN
type = "source" # Type of package is source
)Let’s look at the contents of the package. Like all packages, let’s start by loading it:
library("phillips2017cognition")Now you can access the package. You can view the main help page as follows:
help(package = "phillips2017cognition")On this page, you can see two main links: one to the DESCRIPTION file, and another to User guides and package vignettes. Feel free to click around on these to learn what they tell you.
The purpose of this document is to demonstrate the basics of creating an R package for the purposes of documenting scientific research. In this document, I will take you through the following 6 basics steps of creating a package.
Once you’ve installed R and RStudio, make sure you have the latest versions of the following packages installed by running the following:
install.packages("knitr")
install.packages("rmarkdown")
install.packages("devtools")
install.packages("rmdformats")Now you’re ready to get started on your package. You’ll start by opening a new project in RStudio – this project will essentially be your new package. To create your new project (aka package), do the following steps:
An R project is simply a directory with specific subfolders and a DESCRIPTION file. Here is how your package folder should look (my package is called phillips2017cognition)
Click here for a longer Guide to DESCRIPTION files
Next you’ll update the DESCRIPTION file for your package. Every R package must have a DESCRIPTION file that contains basic information about your package (e.g.; title, author, description etc.). You should update these by hand. Here are some of the main arguments:
There are additional fields you can add (like URL (to include websites))
Here is a simple DESCRIPTION FILE
Longer guide to saving and documenting data
Ok! Now it’s time to include data in your package. You’re probably used to storing data as Excel, SPSS, or .txt files. However, in creating packages, we’ll store all of our data as R objects (e.g.; dataframes, matrices, lists, vectors) .RData files. .RData files efficiently store lots of R objects with minimal space.
To save data as .RData files, you first need to load the data into your working environment (usually from a .txt file), then save the data (using the save() function) as an .RData file in the data folder in your package.
# Read in the a text file with data from Study 1 of a priming study
priming.s1 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/04/priming-5.txt",
sep = "\t", header = T)You can load as many data objects as you want.
# Save the data object as an .RData file in the data folder
save(priming.s1, file = "data/priming_s1.RData")After you do this, you should see a new file called priming_s1.RData in the data folder in your package.
You don’t need to restrict yourself to one object for each .RData file. You can store as many objects in an .RData file as you’d like. In the following code chunk, I’ll load 4 data objects (2 datasets as dataframes and 2 simulations as vectors) as new objects, then save all four objects to a single .RData file called priming_all.RData
# Store multiple objects in one .RData file
# Study 1 priming data
priming.s1 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/04/priming-5.txt",
sep = "\t", header = T)
# Study 2 priming data
priming.s2 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/04/priming-4.txt",
sep = "\t", header = T)
# simulation.A is 1000 p-values from 1000 experiments where H0 is TRUE (i.e.
# mu = 0)
simulation.A <- sapply(1:1000, FUN = function(x) {
t.test(x = rnorm(100, mean = 0, sd = 1))$p.value
})
# simulation.B is 1000 p-values from 1000 experiments where H0 is FALSE
# (i.e. mu = 1)
simulation.B <- sapply(1:1000, FUN = function(x) {
t.test(x = rnorm(100, mean = 1, sd = 1))$p.value
})
# Save Study 1 data, Study 2 data, and simulations A and B in
# priming_all.RData
save(priming.s1, priming.s2, simulation.A, simulation.B, file = "data/priming_all.RData")Next we will create documentation files for each of the data objects (usually dataframes) you’ve saved in one (or more) .RData files. Documenting files are used to generate help menus for objects – which will tell the user what the data objects mean and how to use them.
For example, to see the help menu for the ChickWeight dataset (a dataset about chicken weights stored in R.), run the following:
help(ChickWeight)You should create a documentation file for each important data object (usually dataframes) stored in your .RData files.
Data documentation files follow a strict format in R. They are comprised of a title, description, and a few special arguments. However, rather than describing them in detail, just look at the example below:
#' Study 1 of Phillips et al. (2017)
#'
#' This dataset contains fictional data from study 1 of a priming experiment.
#'
#'
#' @format A data frame containing 1000 rows and 6 columns.
#' \describe{
#' \item{sex}{(string) - The participant's age}
#' \item{age}{(numeric) - Participant's sex}
#' \item{prime}{(string) - The participant's priming condition (elderly or neutral)}
#' \item{time}{(numeric) - Amount of time (in seconds) it took for a participant to walk down the hall].}
#' }
#' @details Each row corresponds to a participant. Participants were randomly assigned....
#' @source Experiment conducted at the Economic Psychology lab from February 2016 to April 2016
#' @examples
#'
#' # Histogram of participant ages
#' hist(priming.s1$age)
#'
#' # t.test comparing the walking time of participants in each condition.
#' # should give a p-value of .03
#'
#' t.test(formula = time ~ prime,
#' data = priming.s1)
"priming.s1"Here are some notes on creating this documentation file:
You’ll notice that the comments are with #‘instead of the standard #, this type of comments is specific for package documentation. If you want to include a ’real’ comment, do so with the standard # AFTER the #’
Include the name of the object at the end of the file in quotation marks (without the #’ commenting)
The examples at the bottom should all be executable R code. However, you don’t need to include examples if you don’t want to.
Longer guide to writing vignettes
Now that you’ve documented your dataset, it’s time to create longer-form vignettes. A vignette is a long-form document containing a mixture of text, R code, and R output. You can create vignettes that replicate analyses, generate plots…anything you want.
Vignettes are written in R Markdown. R Markdown is a base-bones markup language for writing text. For a thorough overview on how to use R Markdown, check out full R Markdown Guide. For a easy-to-use reference guide on R Markdown, check out R Markdown Quick Reference.
Here are the two steps to creating a vignette:
# Create a new vignette template
devtools::use_vignette("study1")Running this will create a new folder called /vignette (if it does not already exist), and create a new file called study1.Rmd.
# Some random data
x <- c(-0.64, 0.02, 1.27, -0.01, -0.6, -0.55, -1.5, -1.88, -0.58, -0.21, 0.07,
1.23, -0.02, -0.28, -0.76, 0.39, 0.96, 1.54, 0.67, 1.66)
# Conduct a Bayesian 1-sample t-test on x
x.ttestBF <- BayesFactor::ttestBF(x)Longer guide to defining and documenting functions
Most packages will include functions. If the purpose of your package is to document a study, you can include functions that will display vignettes, replicate specific analyses, create plots…pretty much anything you want!
Functions are stored in separate NAME_function.R files in your /R folder. Here’s how to create them
Create a new .R document called NAME_function.R
Define and document your function using documentation notation. This documentation notation specifies the parameters and examples which the user can then view using the help menu (e.g.; ?functionname).
Here is an example of a function called myhist:
#' A customized histogram function
#'
#' This function takes a few arguments and returns a histogram
#'
#' @param x a vector of data
#' @param include.ci A logical value indicating whether or not to include a confidence interval
#' @param ... Other arguments passed on to hist
#' @export
#' @examples
#'
#' myhist(priming.s1$time)
#'
#'
myhist <- function(x, include.ci = T, ...) {
hist(x, yaxt = "n", bty = "n", border = "white", col = "gray", ylab = "",
...)
if (include.ci) {
ci <- t.test(x)$conf.int
segments(ci[1], 0, ci[2], 0, lwd = 5)
}
}Here is a function called study1 which simply opens up a vignette for study 1
#' study1 function
#'
#' This function opens the vignette for study 1. Just run study1() to see the results.
#'
#' @export
#' @examples
#'
#' study1()
#'
study1 <- function() {
vignette("study1", package = "example")
}If you use functions from external packages (e.g.; BayesFactor, dplyr) in your R code, you need to make sure those packages are installed on the user’s computer so all your code will work. You can do this in two ways:
If you submit your package to CRAN (How to submit a package to CRAN), or host it on GitHub (How to put a package on GitHub), you simply need to specify the additional packages in your DESCRIPTION file (CLick Me for more tips on DESCRIPTION files). When you do this, R will automatically install all necessary packages on the user’s machine when (s)he installs your package using install.packages(“yourpackage”) (from CRAN), or install_github(“yourname/yourpackage”) from GitHub.
If you do not host your package online, you can write a simple function (i.e.; get.packages()) that makes sure all of the necessary packages are installed on the user’s machine. To see an example, click here. Make sure to save the documentaion file in the format getpackages_.R in your /R folder.
#' Get packages necessary for some analyses
#'
#' This function checks if you have all the packages necessary for the analyses in this paper. If you don't, they will be installed.
#'
#' @export
#' @examples
#'
#' get.packages()
#'
get.packages <- function() {
existing.packages <- installed.packages()[, 1]
if (("BayesFactor" %in% existing.packages) == F) {
install.packages("BayesFactor")
}
if (("dplyr" %in% existing.packages) == F) {
install.packages("dplyr")
}
print("Ok you're got all the packages you need for phillips2017cognition! You never need to run this again (on this computer).")
}Now you’re ready to put all the elements of your package together! We’ll do this in two steps. First, we’ll run the devtools::document() function. This function will convert all of your documentation files into more technical formats that R understands better. Second, we’ll uset the devtools::build() function to compile all of your package code into a single file.
# If devtools is not installed, install it first with
# install.packages('devtools')
# Create an OBJECT.Rd file
devtools::document()# Build my package as a .tar.gz file
devtools::build()Once you run this code, you should see a new .tar.gz file in your project. This file contains your final package!
If you want to have an unbreakable WDA link, you should write your paper in Sweave. Sweave is a combination of R and LaTeX. By using Sweave, you can write your entire paper (in APA or any other format) and include all of your analyses into one Sweave (.Rnw) document.
It is beyond the scope of this tutorial to explain LaTeX and Sweave from scratch. Rather than giving you a full tutorial, we’ll go over a few examples of Sweave documents in APA style.
First, make sure you’ve installed the latest version of the following packages. The first two packages (knitr and xtable) are general purpose packages that help you create Sweave documents. The second two packages (dplyr and BayesFactor), are specific to some of the R functions I’ll use in an example
# These packages help with Sweave
install.packages("knitr")
install.packages("xtable")
# These packages will help with specific examples
install.packages("dplyr")
install.packages("BayesFactor")Next, you’ll need to change some of the the RStudio Sweave preferences. Go to RStudio – Preferences – Sweave and make the following two changes:
Ok we’re ready to chack out a Sweave document! To start a new Sweave file in RStudio, go to File – New File – R Sweave. When you do this, a new document will open with a few LaTeX commands (like \begin{document}. Save the Sweave (.Rnw) document under a new name (e.g.; apasweave.Rnw).
Now we need to replace this with Sweave code that will create an APA style document. It’s best to start with an example. I’ve created an example Sweave document in the following link (Simple APA Sweave File). Open the link and copy and paste all the text into your Sweave document. Then create a pdf from the document by clicking the “Compile PDF” button above the document.
Great! You’ve successfully created a new R package! However, to access it, you need to install the package. To do this, just use the install.packages() command and put the file location as the main argument. You’ll also include two extra arguments that tell R that you are not installing the package from CRAN.
install.packages(pkgs = "/Users/nphillips/Dropbox/Temp/example_0.1.0.tar.gz",
repos = NULL, # Tells R not to try to get the package from CRAN
type = "source" # Type of package is source
)To install a package from the web, just include the weblink as the main argument (pkgs). For example, here is a public link to my package on dropbox:
install.packages(pkgs = "https://www.dropbox.com/s/mhb7fd1r0ohx5cs/example_0.1.0.tar.gz?dl=0",
repos = NULL, # Tells R not to try to get the package from CRAN
type = "source" # Type of package is source
)