Basics of reproducible research


Motivation


Main components of reproducible research

Data

Easily accessible in open data repository or provide the data from your own server.

  • Raw data should be considered read-only and stored seperately.
  • If possible, keep the names of local files downloaded from the internet or copied onto your computer unchanged.
  • Exception: names should be as much as representative as possible.
  • Use plain text as much as possible.
  • Make data cleaning as easy and effective as possible; tidy format.
  • Create a script that can automatically generate clean data from the raw data.

Methods

Use efficient workflow, robust directory layout, clean code, and share it with a version control, collaborative platform, such as github.

  • Use Readme files.
  • Maintain a consistent folder structure across projects.
  • Have a consistent coding style.
  • Reduce copy-pasting code as much as possible.
  • Break code into small, discrete pieces. Ideally, each script file should do one thing.
  • Separate function definition and application
  • Try not to save your R environments. Try not to load them either.
  • Organize and name files so that they make intuitive sense to your future self, and follow the narrative of the data analysis.
  • Comment a lot, but avoid redundant comments by smart use of naming.
  • Again, names should be as much as representative as possible.

Results

Share results in a dynamic way with Markdown, Shiny or Sharelatex/Googledocs.

  • Results should be kept in a seperate folder.
  • Treat generated output as disposable
  • Documentation is important, because is the key to communicating your workflow and findings with your future self, collaborators, peers, and the general public.
  • Guess what: names should be as much as representative as possible.

Remember that publishing is not the end of your research, but a way station towards your future analyses and the future analyses of others.

To further enhance collaboration you can use slack.

Pros and cons of the reproducible approach

Pros

  • Dynamic
  • Collaborative
  • Archieving
  • Facilitates peer review
  • Increase audience

Cons

  • Time consuming (at first place)
  • Needs training/coordination
  • Most of us are afraid of errors

A short introduction to Markdown

What is Markdown?

“Markdown is a text-to-HTML conversion tool for web writers.” Source: Markdown Web page

Why Markdown has become so popular?

Because…

  1. it is easy. To beign with all you need to learn is just the first page.
  2. it is fast and clean; you make less mistakes -> increases efficiency. Here is an example.
  3. it is portable; your documents can be edited in any text application on any operating system.
  4. it is flexible;
    • many other platforms/languages are using it, e.g. Dropbox, Github and of course R.
    • variety in applications (e.g. emails, webpages, presentations, even books!)

So, why R Markdown?


R Markdown

R Markdown has some significant strengths:

  • Supports many formats (HTML, PDF, Word), which..
  • Helps us present/teach a specific method in R, which..
  • Allows collaborative science and coding (cloud-computing).

Examples of successful applications of reproducible research using Markdown