Main components of reproducible research
Data
Easily accessible in open data repository or provide the data from your own server.
- Raw data should be considered read-only and stored seperately.
- If possible, keep the names of local files downloaded from the internet or copied onto your computer unchanged.
- Exception: names should be as much as representative as possible.
- Use plain text as much as possible.
- Make data cleaning as easy and effective as possible; tidy format.
- Create a script that can automatically generate clean data from the raw data.
Methods
Use efficient workflow, robust directory layout, clean code, and share it with a version control, collaborative platform, such as github.
- Use Readme files.
- Maintain a consistent folder structure across projects.
- Have a consistent coding style.
- Reduce copy-pasting code as much as possible.
- Break code into small, discrete pieces. Ideally, each script file should do one thing.
- Separate function definition and application
- Try not to save your R environments. Try not to load them either.
- Organize and name files so that they make intuitive sense to your future self, and follow the narrative of the data analysis.
- Comment a lot, but avoid redundant comments by smart use of naming.
- Again, names should be as much as representative as possible.
Results
Share results in a dynamic way with Markdown, Shiny or Sharelatex/Googledocs.
- Results should be kept in a seperate folder.
- Treat generated output as disposable
- Documentation is important, because is the key to communicating your workflow and findings with your future self, collaborators, peers, and the general public.
- Guess what: names should be as much as representative as possible.
Remember that publishing is not the end of your research, but a way station towards your future analyses and the future analyses of others.
To further enhance collaboration you can use slack.