1. Adopting a coding style for your team

Every team/individual has different preferences and may lean one way or another but the important thing is to decide as a team and stick to that decision.
Spell check button in RStudio when using RMarkdown/Quarto.

Examples:

Use = or <- or -> to assign variables in the environment?
American or British English for function names and arguments?
colour vs. color: {ggplot2} allows for both but in most of your own functions/packages a team would pick one or another.
Most software development/programming in general is done using American English so I recommend writing things that way but as always… it depends (and documentation is a completely separate thing of course, that should be tailored to your main user-base).
Use {lintr} to analyze your code and flag for any stylistic issues.

Resources

2. Organizing code into functions

DRY: “Don’t Repeat Yourself”

Write out lines of code in a R script or RMarkdown file for what you want to do. Once you’re satisfied, then slowly start wrapping the function skeleton around it to transform it into a function(s).
What are the interchangeable things in your function? These bits may turn into the primary arguments to your function.
What is the end result you want to return to the user? A data.frame? a plot? a list of lists?
Think not just about one individual function but how each function fits into the larger picture of the entire package/script, especially if its code that you eventually want to put into a package.
One function may output the data or something else needed for another function, so make the transition as smooth as possible.
Are you doing too much in a single function? Break it down into more manageable components, separate it out into many smaller functions.
Organize your functions in that each individual function is primarily doing one specific particular thing, preferably with one output.
This all also makes your functions easier to document and also to test and debug not just for users, but you and your future self as the developer.
Creating functions allows you to then run the same operations for different things, for example: different geographical areas (creating the same data viz for different cities/regions/countries), different job sectors, different types of agricultural products (a function that calculates sales volumes), etc. by defining and passing arguments into a function. From there you can iterate a function over a set of different things and organize your code better and write less code by hand.

## Get form schema for multiple forms...
abcd_schema <- activityinfo::getFormSchema(formId = "abcd")
efgh_schema <- activityinfo::getFormSchema(formId = "efgh")
ijkl_schema <- activityinfo::getFormSchema(formId = "ijkl")
mnop_schema <- activityinfo::getFormSchema(formId = "mnop")
qrst_schema <- activityinfo::getFormSchema(formId = "qrst")
## etc...

## DRY: Don't Repeat Yourself! >>> Use iteration!
form_id_list <- list("abcd", "efgh", "ijkl", "mnop", "qrst")

## use `lapply` and other base R 'apply' functions
all_form_schemas <- lapply(form_id_list, function(r) as.data.frame(activityinfo::getFormSchema(r)))

## or use purrr::map() and other {purrr} functions
all_form_schemas <- purrr::map(
  form_id_list,
  ~ activityinfo::getFormSchema(formId = .x)
)

Resources

3. Organizing functions into packages

Use {usethis} & {devtools} package functions to create package structure and make package dev easier!

## Create a new package directory
usethis::create_package("MyNewPackage")

## Use git or connect to an existing GitHub repository
usethis::use_git()
## OR
usethis::use_github()

## Create README file
usethis::use_readme_rmd()

## Create R script to house your function code
usethis::use_r("viz_functions")

## Create test infrastructure with {testthat}
usethis::use_testthat()

## Create a vignette
usethis::use_vignette()

## Add {ggplot2} or any other package as a dependency
usethis::use_package("ggplot2")

## Create roxygen documentation
devtools::document()

## Build package
devtools::build()

## Test package
devtools::test()

## AND MANY MANY MORE!

Example Package for some ETL process

Individual component functions:
1. get data: ex. getData() (and other smaller functions)
1. clean data: ex. cleanData() (and other smaller functions)
1. transform data: ex. transformData() (and other smaller functions)
1. visualize data: ex. visualizeData() (and other smaller functions)
1. Save to a folder or append data to a db: ex. saveData() and/or appendData() (and other smaller functions)
etc.

How to run package code in a script?

One large package function (ex. runAnalysis()): basically executes everything?
Have the individual component functions run one-after-another in a R script?
etc…

Resources

4. Documenting code

{roxygen2}: Generate R package documentation from inline R comments

#' @title FUNCTION_TITLE
#' @description FUNCTION_DESCRIPTION
#' @param data PARAM_DESCRIPTION
#' @return OUTPUT_DESCRIPTION
#' @details DETAILS

## etc...

Then you can run:

devtools::document()

So that the code of your function and the comments will be transformed into Rd files in the man/ directory which is where documentation files for R packages are kept.

{sinew} addin for RStudio: Automatically generate documentation structure and content based on function code written in a script. See GIF below!

{lifecycle} badges: Tell users the development status of your functions!

?tidyr::spread

Testing

Start with common sense ones: Is my function output the correct type?, Does my function error when I x/y/z?
Code can fail in ways that you didn’t think possible so at the start of a new package, you won’t really have comprehensive tests. Future bugs and issues from you/people using the package are an opportunity to add new tests to make sure that the problem doesn’t happen again!
Best way to learn is probably to read how other packages have written tests, especially if the package(s) are similar to what you’re working on.
Use {codecov} to discover how much code in your package functions that the tests are properly covering. Also note that you shouldn’t get too transfixed on the % of coverage but focus on the quality of the tests first and foremost.

Creating a {pkgdown} website

README file becomes your front page
Package documentation exist as individual web pages
Vignettes, news, and other things are included too
Bonus: Use HTML and CSS to spice up the appearance!
Examples: tvthemes, ggshakeR

Resources

5. Using version control

Git: version control system.

GitHub: Popular internet hosting for Git version control system, provides a nice user interface to store files and to enable collaboration with other users.

Github Projects: Kanban/Trello style board that you can use to organize your tasks as ‘issues’ in individual repositories

Example: {ggshakeR} version 0.2.0 roadmap

Github Issues:

Title: Start with a verb describing the main action that the issue is supposed to fix, then a short description.
- Create..., Fix..., Simplify..., etc.
Description: Use the first comment box once you’ve created the issue to describe in a bit more detail. Specify the function(s) you want to work on (you might have already mentioned it in the title), possible steps you want to take, some brainstorming thoughts. For bug issues sometimes you may not have too much to say here… yet.
Assignments: On the right side-bar of every issue are various buttons that you can use to tag and organize your issue.
- Assign a person via Assignees
- Label your issue as bug, enhancement, etc. (you can customize these to fit your project)
- Project/Milestone: If this is part of a larger release ‘Project’ or ‘Milestone’ you can add the issue to those from here.

You can also create issue templates to streamline this and/or make it easier to gather actionable information from users.

Branch

Create a new branch based on the issue: I like to name it in a similar vein to the issue. Start with an action verb then a short description (can be difficult at times especially as I prefer to keep it to less than 5 whole words…) but this time I also append the Github issue number at the end.
- Example: create-passnetwork-function-#56.
GitHub commit message keywords: references, closes, etc. so that the changes you are making to the code is tracked and linked to a specific issue in a repository.

Pull Request

Merge the changes you made on a separate branch to the main branch via a Pull Request!

You can create a checklist template that shows up whenever a PR is created on Github so that the user/developer is reminded of what needs to be done before the code can be fully merged.

Resources

Automate your code with Github Actions (GHA):

GHA: Automate software workflows (CI/CD tool) available on Github.

Can be used in R for:

Run package checks & tests

Check test coverage with {codecov}
Run {lintr} & {styler}

Update your {pkgdown} website
… and more!

Examples of using GHA:

6. Examples from the field

AV Organization:

AVDG___: Package Repositories (tests, internal data, documentation of code)
↕↕↕️
AV___: Script/dashboard/Shiny app Repositories (activation scripts using package code, external data, documentation of process/purpose)

Create separate repository for executing code in packages
- package repositories are just for the package code itself, not the execution of the package code
- shell scripts to execute code, creates log files, contains folders holding new or back up data, the dashboard template file that your package code builds from, documentation for the entire process/app/etc., other peripheral stuff
Most error handling is done outside of package code in the activation script in the script repository
System set up in a way that code output is logged in a text file and then can be parsed by a separate script that collects all log files to show success/failure of all scripts/ETL processes in a single centralized dashboard

Command Center dashboard: At a glance see status of all R related scripts and processes run on the server, created using flexdashboard RMarkdown template that is activated by a R script automated by a cron job. Different types of visualizations to show how long is a script taking?, at what time throughout the day are my scripts running?, are any scripts showing errors? if so take me to the log file, etc.

Personal/shared work spaces for project staff: AV_Phillippines, AV_Bangladesh
Admin: Access rights to a few or all Github repositories depending on role (HQ staff? Project staff?)

Contact Info 📫💬

Hope you found this helpful. If you need some help with organizing your R code base,

Contact Ryo Nakagawara:

Managing Large R Codebases

Ryo Nakagawara

October 6, 2022

1. Adopting a coding style for your team

Resources

2. Organizing code into functions

Resources

3. Organizing functions into packages

Example Package for some ETL process

Resources

4. Documenting code

Testing

Creating a {pkgdown} website

Resources

5. Using version control

Github Issues:

Branch

Pull Request

Resources

Automate your code with Github Actions (GHA):

Examples of using GHA:

6. Examples from the field

Contact Info 📫💬