Every team/individual has different preferences and may lean one way or another but the important thing is to decide as a team and stick to that decision.
Spell check button in RStudio when using
RMarkdown/Quarto.
Examples:
Use = or <- or -> to
assign variables in the environment?
American or British English for function names and arguments?
colour vs. color: {ggplot2} allows for
both but in most of your own functions/packages a team
would pick one or another.
Most software development/programming in general is done using
American English so I recommend writing things that way but
as always… it depends (and documentation is a
completely separate thing of course, that should be tailored to your
main user-base).
Use {lintr} to analyze your code and flag for any stylistic issues.
DRY: “Don’t Repeat Yourself”
Write out lines of code in a R script or RMarkdown file for what you want to do. Once you’re satisfied, then slowly start wrapping the function skeleton around it to transform it into a function(s).
What are the interchangeable things in your function? These bits may turn into the primary arguments to your function.
What is the end result you want to return to the
user? A data.frame? a plot? a
list of lists?
Think not just about one individual function but how each function fits into the larger picture of the entire package/script, especially if its code that you eventually want to put into a package.
One function may output the data or something else needed for another function, so make the transition as smooth as possible.
Are you doing too much in a single function? Break it
down into more manageable components, separate it
out into many smaller functions.
Organize your functions in that each individual function is primarily doing one specific particular thing, preferably with one output.
This all also makes your functions easier to
document and also to test and
debug not just for users, but you and your future
self as the developer.
Creating functions allows you to then run the same operations for different things, for example: different geographical areas (creating the same data viz for different cities/regions/countries), different job sectors, different types of agricultural products (a function that calculates sales volumes), etc. by defining and passing arguments into a function. From there you can iterate a function over a set of different things and organize your code better and write less code by hand.
## Get form schema for multiple forms...
abcd_schema <- activityinfo::getFormSchema(formId = "abcd")
efgh_schema <- activityinfo::getFormSchema(formId = "efgh")
ijkl_schema <- activityinfo::getFormSchema(formId = "ijkl")
mnop_schema <- activityinfo::getFormSchema(formId = "mnop")
qrst_schema <- activityinfo::getFormSchema(formId = "qrst")
## etc...
## DRY: Don't Repeat Yourself! >>> Use iteration!
form_id_list <- list("abcd", "efgh", "ijkl", "mnop", "qrst")
## use `lapply` and other base R 'apply' functions
all_form_schemas <- lapply(form_id_list, function(r) as.data.frame(activityinfo::getFormSchema(r)))
## or use purrr::map() and other {purrr} functions
all_form_schemas <- purrr::map(
form_id_list,
~ activityinfo::getFormSchema(formId = .x)
)
## Create a new package directory
usethis::create_package("MyNewPackage")
## Use git or connect to an existing GitHub repository
usethis::use_git()
## OR
usethis::use_github()
## Create README file
usethis::use_readme_rmd()
## Create R script to house your function code
usethis::use_r("viz_functions")
## Create test infrastructure with {testthat}
usethis::use_testthat()
## Create a vignette
usethis::use_vignette()
## Add {ggplot2} or any other package as a dependency
usethis::use_package("ggplot2")
## Create roxygen documentation
devtools::document()
## Build package
devtools::build()
## Test package
devtools::test()
## AND MANY MANY MORE!
Individual component functions:
getData() (and other smaller
functions)cleanData() (and other smaller
functions)transformData() (and other smaller
functions)visualizeData() (and other smaller
functions)saveData()
and/or appendData() (and other smaller functions)etc.
How to run package code in a script?
runAnalysis()):
basically executes everything?#' @title FUNCTION_TITLE
#' @description FUNCTION_DESCRIPTION
#' @param data PARAM_DESCRIPTION
#' @return OUTPUT_DESCRIPTION
#' @details DETAILS
## etc...
Then you can run:
devtools::document()
So that the code of your function and the comments will be
transformed into Rd files in the man/
directory which is where documentation files for R packages are
kept.
?tidyr::spread
Start with common sense ones:
Is my function output the correct type?,
Does my function error when I x/y/z?
Code can fail in ways that you didn’t think possible so at the start of a new package, you won’t really have comprehensive tests. Future bugs and issues from you/people using the package are an opportunity to add new tests to make sure that the problem doesn’t happen again!
Best way to learn is probably to read how other packages have written tests, especially if the package(s) are similar to what you’re working on.
Use {codecov} to discover how much code in your package functions that the tests are properly covering. Also note that you shouldn’t get too transfixed on the % of coverage but focus on the quality of the tests first and foremost.
README file becomes your front page
Package documentation exist as individual web pages
Vignettes, news, and other things are included too
Bonus: Use HTML and CSS to spice up the appearance!
Git: version control system.
GitHub: Popular internet hosting for Git version control system, provides a nice user interface to store files and to enable collaboration with other users.
Github Projects: Kanban/Trello style board that you can use to organize your tasks as ‘issues’ in individual repositories
Create..., Fix...,
Simplify..., etc.You can also create issue templates to streamline this
and/or make it easier to gather actionable information from users.
create-passnetwork-function-#56.references,
closes, etc. so that the changes you are making to the code
is tracked and linked to a specific issue in a repository.Merge the changes you made on a separate branch to the main branch
via a Pull Request!
You can create a checklist template that shows up whenever a PR is created on Github so that the user/developer is reminded of what needs to be done before the code can be fully merged.
Can be used in R for:
AV Organization:
AVDG___: Package Repositories (tests, internal data,
documentation of code)AV___: Script/dashboard/Shiny app Repositories
(activation scripts using package code, external data, documentation of
process/purpose)Create separate repository for executing code in packages
Most error handling is done outside of package code in the activation script in the script repository
System set up in a way that code output is logged in a text file and then can be parsed by a separate script that collects all log files to show success/failure of all scripts/ETL processes in a single centralized dashboard
flexdashboard RMarkdown template that is activated by a R
script automated by a cron job. Different types of visualizations to
show how long is a script taking?,
at what time throughout the day are my scripts running?,
are any scripts showing errors? if so take me to the log file,
etc.Personal/shared work spaces for project staff:
AV_Phillippines, AV_Bangladesh
Admin: Access rights to a few or all Github repositories depending on role (HQ staff? Project staff?)
Hope you found this helpful. If you need some help with organizing your R code base,
Contact Ryo Nakagawara: