Introduction

This article provides an easy to follow checklist for creating an R package using the {devtools}/{usethis} framework as explained in the book R Packages (2e) by Hadley Wickham and Jennifer Bryan. The target audience for this article is for both new and experienced R developers. For new R developers, creating an R package may seem overwhelming and complicated at first but, this article aims to remove the mystery from the process by providing a tutorial. Experienced R developers who need a quick refresher on how to set-up an R package or add certain features to an existing R package (such as a {pkgdown} website) can use this article as a quick reference guide or cheat sheet.

This article will walk you through how to build a professional R package with the following features:

  • Version control using git and GitHub:

    • Automated GitHub actions for R CMD check with README badge

    • Automated documentation and code styling with pull requests

  • Testing:

    • Unit testing using {testthat}

    • Automated GitHub actions for code testing coverage using CodeCov and {covr} with README badge

  • Professional looking package website hosted on GitHub Pages and built using {pkgdown}:

    • Includes instructions on how make a package logo
  • Internal and external data files

  • Documentation:

    • Function and data documentation using {roxygen2}

    • README using a .Rmd file

    • NEWS.md

    • Articles and vignettes

    • DESCRIPTION file set-up with license

  • Code styling using {styler}

The checklist consists of 32 steps in total which are divided in to 10 sections, A through J. Where possible, I provide a rough time estimate to complete each section:

  • The first four sections A. Initial Set-up, B. Version Control Using git and GitHub, C. Package Documentation, and D. GitHub Actions, which collectively encompass steps 1 through 11, are considered the minimum required to establish a proper R package framework and take roughly 30 minutes to complete.

  • Section E. Functions and Function Documentation includes steps 12, 13, and 14 and involves writing the actual R code for your package; how long this takes depends on the package.

  • Note: once you have completed sections A through D (steps 1 through 11) and as your work on section E (steps 12 through 14) you should start using the standard development work flow described in the Post Set-up Workflow section of this article.

  • All sections after E. Functions and Function Documentation are optional and can be completed as needed or as time allows, in any order. These sections include:

    • F. Code Styling (step 15). Time commitment: about 5 minutes.

    • G. Additional Package Documentation (step 16). Time commitment: depends on how many package vignettes and articles you want.

    • H. Data Files (steps 17 through 21). Time commitment: depends on how many data files you need.

    • I. Testing (steps 22 through 28). Time commitment: depends on the amount of code in your package and what level of code test coverage you consider acceptable.

    • J. Website and Logo (steps 29 through 32). Time commitment: about 30 minutes.

Each step in this article also provides references with links.

Prerequisites

GitHub and CodeCov accounts: throughout the article certain platforms are chosen to perform certain actions: GitHub for repository hosting, GitHub Actions for CI/CD pipelines, GitHub Pages for the {pkgdown} website, and CodeCov for unit testing coverage. There are alternatives to these platforms however, the {devtools}, {usethis}, {pkgdown}, {covr} packages used to build these features are conducive to using these popular platforms therefore, you must have GitHub and CodeCov accounts.

R Studio Installation Settings: this article assumes your R Studio installation was previously configured according to the {usethis} set-up guide. This guide ensures that {devtools} and {usethis} are loaded every time you start R Studio, it provides default DESCRIPTION file settings, and connects your GitHub account to R Studio.

Steps

A. Initial Set-up

1. Check the package name is available

available::available("[PROPOSED PACKAGE NAME]")

Reference: https://r-pkgs.org/workflow101.html#use-the-available-package

2. Create a new directory

dir.create("[PATH TO PACKAGE]")

3. Set-up the directory as an R package:

setwd("[PATH TO PACKAGE]")
create_package()

Reference: https://r-pkgs.org/whole-game.html#create_package

B. Version Control using git and GitHub

4. Set-up local version control

use_git()

Next, make your first commit using the terminal using the git commands add and commit.

Reference: https://r-pkgs.org/whole-game.html#use_git

5. Create a GitHub remote repository

# if you did step 5, ignore the first message about uncommitted changes
use_github()

Reference: https://r-pkgs.org/whole-game.html#use_github

C. Package Documentation

6. Populate the DESCRIPTION file

Update these fields:

# the MIT license is one of the most permissive but there are other options
use_mit_license()

References:

7. Create a README using Rmarkdown

use_readme_rmd()

After populating the basic information in the README.Rmd file, build the .md version of the file using:

build_readme()

Reference: https://r-pkgs.org/whole-game.html#use_readme_rmd

8. Create a NEWS file

use_news_md()

Edit the file as needed. It will be used as a change log for package version releases.

Reference: https://r-pkgs.org/other-markdown.html#sec-news

9. Create package documentation

This creates a dummy file to establish package level documentation:

use_package_doc()

Reference: https://r-pkgs.org/man.html#sec-man-package-doc

D. GitHub Actions

10. Automate R CMD check

Every time you push the package to the remote repository, R CMD check will be run and system compatibility will be checked for a variety of OS’s.

use_github_action("check-standard")

This will also add the R CMD check badge to the README.Rmd file so the README.md file should be re-built:

build_readme()

Reference: https://r-pkgs.org/software-development-practices.html#r-cmd-check-via-gha

11. Automate documentation and code styling for pull requests

use_github_action("pr-commands")

Reference: https://usethis.r-lib.org/reference/github_actions.html#use-github-action-pr-commands-

E. Functions and Function Documentation

Steps 12 through 14 outline how to build the core of your package: the code base. Instead of building the entire code base and then creating your tests, I recommend you that you iteratively build the code base and test it as you go. See section I Testing for instructions on how to set-up your testing suite.

12. Add your code

Your package code must exist in .R files in the R/ directory. Create each .R file using:

use_r("[NEW R SCRIPT NAME]")

Reference: https://r-pkgs.org/whole-game.html#use_r

13. Import packages

As functions from other packages are used throughout your code they must always be called using the [PACKAGE NAME]::[FUNCTION]() method. Each unique package that is called in your code must be added as an import in the DESCRIPTION file. This can be done using:

use_package("[IMPORTED PACKAGE NAME]")

It’s also a good idea to import {magritts}’s pipe operator:

use_pipe(export = TRUE)

References:

14. Create function documentation

For each function in the R/ directory, create {roxygen2} comments using the references below, then build all the package documentation files by running:

document()

References:

F. Code Styling

15. Apply code styling using the {styler} package

To style the entire package run:

use_tidy_style()

Note: you should always commit changes immediately before and after running the above code to separate style changes from code changes in the git history.

References:

G. Additional Package Documentation

16. Articles and vignettes

Use articles and vignettes to provide package instructions and example workflows beyond what is in README.Rmd. Vignettes will be included with your package on install. Articles will not be included with your package on install but will be available on the {pkgdown} website. These can be created using the functions:

use_vignette("[NEW VIGNETTE NAME]")
use_article("[NEW ARTICLE NAME]")

The first time each of these functions is called the directories vignettes and vignettes/articles are created, respectively.

Reference: https://r-pkgs.org/vignettes.html

H. Data Files

There are five locations for storing/creating R package data files. Choosing a directory to use for a file depends on how the file is to be used and whether the user should have access to the file after they install your package.

  1. data-raw/: this directory can store raw data files and is used for creating data in any of the other data directories below. Nothing in data-raw/ will be available to end users after installation and load_all() does not load any data files in this directory into the developer’s environment. See step 17.

  2. data/: this directory stores data files, in a native R format (ie .rda), that the user will have access to after package installation after calling [PACKAGE NAME]::[DATA OBJECT NAME] or by calling library([PACKAGE NAME]) and then referring to the DATA OBJECT NAME. Each data object should have its own .rda file and must be documented. As the package developer, can load these data files into your environment using load_all(). See steps 18 and 19.

  3. R/sysdata.rda: this is a file that stores all R data objects for internal package use. These data objects will not be available to the user after installation. As the package developer, can load these data files into your environment using load_all(). See step 20.

  4. inst/extdata: this directory stores data files that should be available to the user after installation but which are not in a native R format (ie .csv). These files cannot be called directly as R objects by the end user and load_all() will not make these available in the developer’s environment. The package developer or end user can get the path to a data file in this directory by calling system.file("extdata/[DATA FILE NAME]", package = "[PACKAGE NAME]") and then load that file into the global environment by calling the appropriate read function (ie read.csv()). See step 21.

  5. tests/testthat/fixtures: this directory stores data files in any format that are used for unit testing. By default, these files can be accessed by the user after installation. These files cannot be called directly as R objects by the end user and load_all() will not make these available in the developer’s environment. The package developer can get the path to a data file in this directory by calling testthat::test_path("[DATA FILE NAME]") and then load that file into the global environment by calling the appropriate read function (ie read.csv()). An end users would need to specify the full path to the file in their R library directory and then call the appropriate read function. These types of data files will not be addressed in this section, instead see the section I Testing, specifically step 25.

17. Set-up the raw data directory

The function below will create an R script for creating data files. The first time it is called, it will also create the data-raw/ directory.

use_data_raw("[NEW DATA FILE NAME]")

It’s good practice to have one R script in data-raw/ for each data file in the package. At the end of each script, you will save the data object to one of the other four data locations.

Reference: https://r-pkgs.org/data.html#sec-data-data-raw

18. Create an exported data file in data/

This step will create a file in the data/ directory which will be available to a user after package installation. Data stored in this manner must be in a native R data format (ie .rda). For each script in data-raw/ that is supposed to create an exported data file, call the function below at the end to save the file in the correct location:

use_data("[NEW DATA FILE NAME]", overwrite = TRUE)

Reference: https://r-pkgs.org/data.html#sec-data-data

19. Document a data file in the data/ directory

Every file in data/ must have package documentation created using {roxygen2}. The documentation for all of the files in data/ should reside in the file R/data.R file. Create the file using:

use_r("data.R")

Then populate it per the reference below.

Reference: https://r-pkgs.org/data.html#sec-documenting-data

20. Create package internal data

Internal package data will not be shared with the end users of the package. All internal data is stored in the R object R/sysdata.rda. You can create a script to create this object in data-raw like this:

use_data_raw("sysdata")

At the end of the script call:

use_data(..., overwrite = TRUE, internal = TRUE)

Reference: https://r-pkgs.org/data.html#sec-data-sysdata

21. Use non-R data files

Any non-R data files that you want the user to have access to after installation should be stored in inst/extdata. They can be created under data-raw but at the end of the script should not call use_data(). Instead save them using something like:

dir <- system.file("extdata", package = "[PACKAGE NAME]")
write.csv(data, file = file.path(dir, "[FILE NAME].csv"))

Reference: https://r-pkgs.org/data.html#sec-data-extdata

I. Testing

22. Set-up {testthat}

use_testthat()

Reference: https://r-pkgs.org/testing-basics.html#initial-setup

23. Writing tests

Create a test script for each file in R/ that you want to test using:

use_test("[NAME OF .R FILE IN R/]")

Populate the file according to the {testthat} convention using the references below.

References:

24. Running tests

You can run individual test functions inside a test file by highlighting and running that section of code. Make sure to always run load_all() first and after making any change to the file being tested.

You can run all tests for a file in R using:

test_file("[NAME OF .R FILE IN R/]")

Test the entire package by running:

test()

References:

25. Testing helper functions and fixtures

You can include test helper functions by creating the file tests/testthat/helpers.R.

You can also include data files to assist with testing (called test fixtures) by creating the directory tests/testthat/fixtures and storing them there. These can be native R files or not. It’s a good idea to create these data files in the data-raw/ directory. Also see section H. Data Files.

References:

26. Prepare the testing coverage framework for CodeCov

use_coverage(type = "codecov")

The code above will also add a CodeCov badge to the README so the README must be re-built:

build_readme()

Reference: https://usethis.r-lib.org/reference/use_coverage.html

27. Check the test coverage

To check test coverage for the entire package run one of the two lines of code below, depending the output type you want to see. Note, before running the functions below, you must restart R.

# this will produce a report in R Studio's Viewer window
covr::report()

# alternatively, this will print the results to the console
covr::package_coverage()

To check test coverage for an individual file, run the following line of code which will produce a report in the R Studio Viewer window that will highlight which lines of code are tested and which are not tested.

devtools::test_coverage_active_file()

References:

Post Set-up workflow

Once you have completed sections A through D and begin working on section E, it’s time to start using the iterative workflow for package development below. Following this process is good practice to ensure a quality R package that is tested and documented.

  1. Iteratively develop the code in one file in R/ using:
# update global environment with recent package changes
load_all()

# test the file being worked on by running sections of the test file or by running:
test_active_file()
  1. Once you are satisfied with the changes, test the entire package using:
test()
  1. Update the package documentation using:
document()

# when changes to the README are made, also run
build_readme()
  1. Run R CMD check:
check()
  1. If R CMD check passes with 0 errors and 0 warnings, commit the changes to the repo using the standard git commands add, commit, and push in the terminal.

References:

References

The primary reference used is the book R Packages (2e) by Hadley Wickham and Jennifer Bryan. Other references include: