This document offers a list of suggestions for your consideration in order to assist you in effectively gathering and cleaning high-quality survey data.
Include a calculated field to calculate individual member UUID, which can be generated using the uuid() function in the calculation field of your loop.
Before uploading to the server, make sure you have verified all relevancies. To check relevancies, you can utilize a Shiny app that can be accessed at this here.
As packages under IMPACT R Ecosystem will be using default value from KOBO template, so using of exact value from KOBO template is highly recommended. This will eventually reduce the script writing time. However you can change the label if you must but please do not change the name column.
It’s recommended to use the OCHA COD name in your KOBO tool’s
name column. You’re free to modify the label as desired, but please
avoid changing the name column. You can download OCHA COD using the
download_hdx_adm() function, which automatically retrieves
the data from HDX. To obtain a name that can be directly copied to your
KOBO tool, use snakecase::to_snake_case(). Here’s an
example:
# devtools::install_github("mhkhan27/illuminate")
# devtools::install_github("dickoa/rhdx")
# install.packages(sf)
library(illuminate)
library(sf)
library(rhdx)
mali_admin3 <- illuminate::download_hdx_adm(country_code = "mli",admin_level = 3) ## Downloading OCHA-COD admin 3 data for Mali
mali_admin3_df <- mali_admin3 |> mutate(
name = snakecase::to_snake_case(ADM3_FR), ## Change ADM3_FR as necessary
`label::fr` = ADM3_FR
) |> as.data.frame()|> select( `label::fr`,name)
The code above should generates a table like following, that can be
copied directly into the KOBO choices tab. While you’re
free to modify the label::fr, please refrain from changing
the name, as it will be used for upcoming functions that
will be available soon.
If you don’t have a population dataset or the quality of the existing data is poor, please review the following source with your GIS officer
Progress tracker: Make sure your progress tracker is considering deletion log and consent (=yes) before showing the total number of completed survey. Additionally you have cleaned the strata name (such as possible wrong entry) before running the progress tracker script.
Check duration from audit files: You should check the duration from audit file. There are couple of functions available with in REACH to do that. Contact HQ/Write to RCoP if you dont have one and/or looking for one.
Check for shortage path: Potential shortage path should be checked. Contact HQ/Write to RCoP if need ready-to-use function.
Remove duplicates: Check for and remove any duplicate responses. This can happen if respondents accidentally submit the survey multiple times.
Check outliers: Identify and remove if necessary any extreme values or outliers that may be skewing the data. You should check for both normal and log outliers and it should be checked by strata or by possible livelihood zone.
Check for consistency/Logical check: Ensure that responses are consistent across related questions. For example, if a respondent reports their age as 15 but reports having 4 childs, this should raise a red flag.
Updating choice multiple qustion: Please make sure you have changed the parent question if you make changes to a choice multiple question in your cleaning. not only the binary columns.
Review and clean open-ended responses: Review and clean any open-ended responses, such as comments or feedback. Remove any inappropriate or offensive language, correct spelling, grammar errors, translation and identify common themes or trends.
Fill out the cleaning log: Do not removed any flagged value from the cleaning log. In case no need is necessary then just write it down the cleaning log otherwise HQ might also flag the issue again and might be asking for explanation and it might increase the validation time.
Document data cleaning process: Document the data cleaning process, including any decisions made and any changes made to the data. Ideally you can should add comments against every issues flagged during the daily data monitoring.