To minise the overall workload and to maximise quality, it is important to reduce duplication. Ideally, for each specific thing that needs to be done on data, only a single piece of script should exist that everyone relies on. For this goal there is the IMPACT (R) tool sharing ecosystem outlined below. It is split in two sections: First, how tools can be accessed, and second, how anyone can contribute to these tools. The first section includes a description of the required skills you need to use any of the tools, a list of available tools, and a step by step guide on how to get started with an Impact R package. And how to get help when you get stuck.
To use a tool, you need to:
If you still get stuck, there are multiple ways to get help
All IMPACT R tools are built to be usable by anyone who completed the certified ‘user’ training. Generally you should have R and RStudio installed, be familiar with the RStudio interface, understand variables, basic data types, subsetting, installing/loading packages as well as reading/writing data to/from files.
| name | software | description | authors | status | repository | maintainer |
|---|---|---|---|---|---|---|
| Quicksheets | excel | An Excel template with shortcuts for quicker and better data cleaning. | Oliver Moeller | validated | https://github.com/rolloverlime/quicksheets | oliver.moller@reach-initiative.org |
| cleaninginspectoR | R | Simple data cleaning checks | Martin Barner, Eliora Henzler | under development | ellieallien/cleaninginspectoR | martin.barner@impact-initiatives.org |
| hypegrammaR | R | An implementation of the quantitative analysis guidelines | Martin Barner, Eliora Henzler | under development | ellieallien/hypegrammaR | martin.barner@impact-initiatives.org |
| koboquest | R | Using the kobo tool to identify data types, evaluate skiplogic and apply labels | Martin Barner, Eliora Henzler | validated | mabafaba/koboquest | martin.barner@impact-initiatives.org |
| mergekobodata | R | Merging data from different variations of a tool | Martin Barner, Eliora Henzler | validated | mabafaba/mergekobodata | martin.barner@impact-initiatives.org |
| surveyweights | R | calculate survey weights from sampling frames; combine stratum and cluster weights | Martin Barner, Eliora Henzler | beta | mabafaba/surveyweights | martin.barner@impact-initiatives.org |
| kobostandards | R | check if kobo tool, data, sampling frame and analysis plan comply with format standards and match with each other | Martin Barner | beta | mabafaba/kobostandards | martin.barner@impact-initiatives.org |
| xlsformfill | R | Generate fake data from kobo forms so you can test analysis scripts before any real data is collected | Martin Barner | beta | mabafaba/xlsformfill | martin.barner@impact-initiatives.org |
| koboloops | R | Merge loops into a main dataset, or a main dataset into the loop | Sharon Orengo | beta | sharonorengo/koboloops | sharon.orengo@impact-initiatives.org |
| Setviz | R | Visualisation of set intersections based on upsetR package | Eliora Henzler | live | ellieallien/Setviz | eliora.henzer@impact-initiatives.org |
| clog | R | Update data based on a standard IMPACT cleaning log | Martin BArner | beta | mabafaba/clog | martin.barner@impact-initiatives.org |
first, install devtools (you need to do this only once on your system):
install.packages("devtools")
install.packages("assertthat")
install.packages("crayon")
install.packages("data.table")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("ggthemes")
install.packages("htmltools")
install.packages("knitr")
install.packages("magrittr")
install.packages("purrr")
install.packages("questionr")
install.packages("reshape")
install.packages("reshape2")
install.packages("rmarkdown")
install.packages("stringi")
install.packages("stringr")
install.packages("survey")
install.packages("testthat")
install.packages("tibble")
install.packages("tidyr")
install.packages("utils")
any of the tools in the “Available Tools” list above can be installed by running the following command inside the R console:
devtools::install_github("REPOSITORY_NAME",build_opts = c())
where REPOSITORY_NAME should be replaced by the repository column in the “Available Tools” table above; for example to install the cleaninginspectoR package:
devtools::install_github("ellieallien/cleaninginspectoR",build_opts = c())
At this stage, look at the console panel of RStudio. You can ignore most of it, but if any issues come up, you will be notified here. Issues you might run into are:
RStudio shows you a long list of packages and associated version numbers, and asks you: “Enter one or more numbers separated by spaces, or an empty line to cancel”. click in the console window, then only hit Enter. Don’t press any other keys¨. The installation should then continue.
It says “ERROR: dependency [SOME_PACKAGE_NAME] is not available for package ‘[ANOTHER_PACKAGE_NAME]’” where [SOME_PACKAGE_NAME] is the name of some package. in this case, run
install.packages("[SOME_PACKAGE_NAME]")and try again.
If it says “* DONE”, everything worked ok!
Once the package is installed, load it with
library(PACKAGE_NAME)
example:
library("cleaninginspectoR")
You can find the main information that you need by running:
browseVignettes("PACKAGE_NAME")
where PACKAGE_NAME should be replaced by the actual name of the package, e.g.:
browseVignettes("cleaninginspectoR")
If you don’t know how to follow the instructions in this guide, you probably don’t have the required skills.
If you do, and managed to install the package, you will probably get stuck unless you have read the developer’s documentation of the specific tool.
Once you’ve checked the two prerequisites above, you might still have questions regarding specific functions. Each function has its own detailed help page, which is usually the first place to go for help. You can access it with:
?PACKAGE_NAME::FUNCTION_NAME
For example:
?cleaninginspectoR::inspect_all
The documentation will show up in the “help” panel of RStudio, usually in the bottom left panel.
Generally there are two types of error messages:
Planned error messages were generated on purpose by the tool developer. While they are likely accompanied by a bit of jibberish junk, you should find among it a clear message about what went wrong, why it went wrong and what you need to change. Actually reading the error message and looking for useful hints might give you the solution straight away.
If you can not find anything useful in there, you are probably dealing with an unplanned error message, a problem that the developer of the tool did not anticipate. In that case:
When asking a question remotely (i.e.: the person helping you can’t physically sit down on your computer), you should make it as easy as possible for the other person to help you. This applies both when asking within IMPACT or online (e.g. on StackOverflow).
At the very least, you should provide:
Unless the issue is trivial, the person will not be able to help you without running the code themselves and recreating the problem. The best way to help them do that as to produce what is called a minimal, complete, and verifiable example.
These tools are a collaborative effort across IMPACT teams. Contributing to and adding IMPACT tools is the greatest honour anyone can achieve in life in general. If you would like to learn how to build R packages or have a specific tool in mind that you would like to build, please contact the data unit who will try to support you as much as possible throughout. While tools can generally be based on any software (R, Python, Excel..), we generally recommend sticking to R, as this is currently most widely used in IMPACT and the Humanitarian system, and currently has the most trainings and resources available within IMPACT.
For contributions to be effective (and to be acceptable in the official toolbox), you will find below the required skills and the requirements in terms of coordination, scope, quality and accessibility standards as well as an explanation of the validation process.
To have a place to freely share tools between country teams, there is the Open Data Processing and Analysis Tools Google Sheet. It includes three sheets:
Before starting to build a new tool, coordinate with the GVA Data Unit. Make sure that you..
In order to contribute R Packages, you need to know the content of the Impact builder certificate training. In summary, that is:
Most tools developed across the humanitarian system (and within IMPACT) failed in the past because their functionality was too complex, not reliable enough or not well enough documented.
Therefore packages must have a single, clearly defined functionality.
Packages can be as simple as a single function doing a single thing (an example of this would be the mergekobodata package); complex tasks should be split into independent packages. These may then be combined into higher order packages. They would then have varying levels of complexity/specialisation:
What is important is that specific packages should not contain unspecific code. That means that in the example above, a SYR market monitoring package should rely on the market monitoring analysis package for all code that is not specific to SYR alone. In turn, the market monitoring analysis package should rely on triangulating KI data for anything not totally specific to market monitoring (and so on).
tidyverse package as a whole (instead, depend on the individual tidyverse packages that you need, e.g. dplyr, purrr etc.)?packagename must give an overview over the complete package functionality.If a package is included in the IMPACT package repository, the creator becomes the package maintainer by default.