This is a step by step guide for Pilots running a reproducibility check for the Checking Analytic Reproducibility at Psychological Science (CARPS) project. R code that you may need to run will appear in grey boxes. We strongly recommend you look through the pre-registration for this project before starting out. This will give you a good overview of the general workflow and may contain additional details that have not been included here.

Your goal is to try and reproduce a set of target outcomes using the available data files, information provided in the original article, and any other additional documentation (e.g., codebook or analysis scripts). Your role is not to try and attempt alternative analyses that you believe are superior. We are only interested in reproducibility for the purposes of the current investigation.

You can e-mail Tom with any questions (tom.hardwicke[@]stanford.edu).

Good luck!

Step 1: Setting up

R

You must run your reproducibility check in R. You can download the latest version from here:

https://www.r-project.org/

R Studio

We highly recommend using the free R Studio software which you can download here:

https://www.rstudio.com/products/RStudio/

We will be using several of R Studio’s built in features, such as R Markdown.

Github

We will be using Github for version control and collaboration. We have our own Github ‘organisation’ set up for this project and you can find all of the reproducibility check repositories (repos) here:

https://github.com/METRICS-CARPS

We also highly recommend using the free Github Desktop software which you can download here:

https://help.github.com/desktop/guides/getting-started/installing-github-desktop/

You can find plenty of guides to using Github online. Here is a good place to start:

https://guides.github.com/

We will not be doing anything super fancy so don’t panic if you are not familiar with this tool. If you do not want to use this tool you can switch to a non-Github workflow (exchanging .zip files with Tom via e-mail).

If you are an advanced user, feel free use your own git workflow. However, the rest of this guide assumes you are using the Github Desktop software.

The CARPS Reports package

We have put together a simple R package (‘CARPSreports’) that contains a custom R Markdown template and a couple of custom functions for you to use when preparing your reproducibility report. To install the CARPSreports package, you will first need to install another package called devtools:

install.packages("devtools")

Next, run the following command to install CARPSreports directy from our project Github page:

devtools::install_github("METRICS-CARPS/CARPSreports")

To check that this has installed correctly, click on ‘file,’ ‘new file’, and then ‘R markdown…’ in R Studio. Select ‘from template’ and you should see an option “CARPS Reproducibility Report”. If not, the installation has gone wrong somewhere. Otherwise, you’re good to go!

Step 2: Identify your article

Each article has been assigned a unique code which looks something like this:

CARPS_1-1-2015_PS

On the CARPS Github page you will see that there is a repo for each article. Each repo contains the original article pdf, a targetOutcomes.md file, and a data folder containing a data file and sometimes some other files (more on this below). The repo also comes with a .gitignore file so you do not have to set this up yourself.

Step 3. Fork and clone your repo

Once you have your article ID code, you need to fork the corresponding repository. You can do this by opening the repo on the Github website and clicking on the ‘fork’ button in the top-right. In may take a few minutes for the repo to be forked over to your account. When it has finished you need to clone the repository to your personal computer. The quickest way to do this is to click the green ‘clone or download’ button and then click ‘Open in Desktop’. Make sure you are in the forked repo and not the original master branch! The files will now be downloaded to your computer and you should see the repo in the Github Desktop software.

Step 4. Set up an R Studio project

Open up R Studio. Click ‘file’, ‘new project’, ‘existing directory’ and then browse to whereever you cloned the repo to on your computer. Now click ‘create project’. If the R Studio project is set up correctly then you should see the files from your repo listed in the files section.

Step 5. ‘Commit’ and ‘Sync’ your changes with Github

It is up to you how much you use the Github version control features. It is good practice to ‘commit’ your changes fairly regularly. Each time you commit changes they are saved, and you can ‘roll back’ if you realise later on that you’ve made a mistake.

To commit and sync after you’ve made some changes, open up Github Desktop and select your repo. Where it says ‘summary’ and ‘description’ you can enter some information about this commit so you can work out what you did later on. For example, if you’ve just created the .Rmd document for your reproducibility report, you might call this ‘created report’ and in the description put something like ‘created .Rmd file for reproducibility report’ (just a summary is often sufficient). Now click on ‘commit to master’. Note that at this point you have just commit to your fork ‘master’ on your computer - the changes are just saved locally. Its a good idea to now click on ‘sync’ in the top right, which will back up your changes on the Github website.

You can see a useful graphical representation of the original CARPS master and your fork master in the dark grey box. You are not making any changes to the original CARPS master right now, just your fork. But eventually we are going to connect these back up.

It is good practice to keep committing and syncing your changes regularly, but I’ll leave it up to you how often you do this.

Ok you’re almost ready to start with the actual reproducibility check!

Step 6. Open a new R Markdown file

Now you will open up a new R Markdown file. R Markdown is an approach to ‘literate programming’. This is the idea that we interleave actual code with plain language commentary explaining what we are doing in sufficient detail such that someone who does not understand the code itself can still figure out what we have done (including our future selves). This is a key component of reproducible analysis, and I hope we can exemplify this best practice in our own analysis for CARPS. You should aim to provide detailed commentary in plain text throughout your report.

If you are unfamiliar with R Markdown, there’s a plenty of information available here:

http://rmarkdown.rstudio.com/lesson-1.html

You may also find this ‘cheatsheet’ useful:

https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

To run code that you have entered in ‘chunks’ just click the green arrow.

I have put together a custom R Markdown template so we can keep the CARPS reproducibility reports in a fairly standardised format. To open the template, click on ‘file’, ‘new file’, and then ‘R markdown…’ in R Studio. Select ‘from template’ and you should see an option “CARPS Reproducibility Report” if the CARPSreports package installed correctly (see above). Click on OK.

The R Markdown file that opens will begin with a ‘yaml’ header between two sets of dashed lines. Leave this section as it is. Below that you’ll see some lines referring to various details about this reproducibility check e.g.,

“#### Article ID: [Insert article ID number]”

The ‘#’ here is the markdown way of saying ‘format this as a heading’. Four ‘#’s means heading level 4. One’#’ would be heading level 1.

Throughout the template I included some text in square brackets that you should either replace or delete before submitting your final report. Anything not in square brackets should remain in your report.

So in this case you should replace “[Insert article ID number]” with the article ID (e.g., “CARPS_1-1-2015_PS”). Enter your name as the Pilot and your colleagues name as the Co-Pilot (it doesn’t matter which way around). Enter today’s date as the start date. The end date will be the date that you submit your report via a pull request (details on this below).

Save the R Markdown file with the name “pilotReport.Rmd”

Step 7. Familiarise yourself with the article and associated files

Before we get into the details of the R Markdown template, let’s go and have a look at what is available in the repo. You should have a pdf of the article, a targetOutcomes.md file (.md stands for ‘markdown’), and a data folder containing a data file or files. If any of these are missing contact Tom.

The targetOutcomes.md file can be opened in any text editor, or you can view it in the repo on Github. It outlines exactly which outcomes in the paper you are to try and reproduce.

Please note you will liekly need more information than is included in the targetOutcomes.md file in order to run your reproducibility check. For example, there may be essential pre-processing steps that are listed in the article, but are not included in the targetOutcomes.md file.

You should read the entire article and develop a good understanding of the methods employed by the original authors. Make sure you download any supplementary information files to see if they contain additional important details. You may even find some analysis scripts. This is great news of course because that is very concrete information that should help you run your reproducibility check. If you do find detailed information about the analysis used by the original authors, be sure to include quotations in your report to illustrate this (see below for details how).

Please note You must not directly edit the original data files This cannot be emphasised enough! The original data file must remain as it was when you forked the repo. This is so that others who work on the project can reproduce everything you have done from scratch. If you need to make manual edits to a data file, you should save an additional file (see below for details). If you accidentally make changes to the original data file, then you should roll back these changes using Github (this is why its important to regularly commit changes!).

Step 8. Start completing the R Markdown report: Methods and target outcomes

Your first steps will be to fill in the Methods summary and target outcomes section. You need to write the methods summary from scratch, but you can copy and paste the target outcomes from the targetOutcomes.md file.

The remainder of the report is divided into 5 key stages outlined below.

Step 9. Load packages

Load any necessary R packages. Some useful ones are already listed and you can add any additional ones that you need.

Step 10. Load data

Load data from the file or files in the data folder. You may need different functions for different types of file.

This cheatsheet may be helpful: https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data-import-cheatsheet.pdf

Step 11. Tidy data

Mung/wrangle (organise) the data into a format that facilitates subsequent analysis. We highly recommend learning the concept of ‘tidy data’. For resources see here: http://r4ds.had.co.nz/tidy-data.html

and here: https://www.jstatsoft.org/article/view/v059i10

This cheatsheet may also be helpful: https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data-transformation-cheatsheet.pdf

To the greatest possible extent you should try and conduct data munging operations programmatically in R. In some cases, you may need to make manual adjustments to the data file in, for example, Excel. If you have to do this, you should detail the steps you have taken in your R Markdown report, and save an additional data file with the name “data_manualClean”.

Step 12. Run analysis

This section is further sub-divided into pre-processing, descriptive statistics, and inferential statistics. Work systematically through the target outcomes, attempting to reproduce each reported outcome with the analyses described in the original article (and any supporting documents). Make sure you write down exactly which target outcome you are trying to reproduce before the analysis code so it is easy to compare to the output.

Step 13. Recording errors

Whenever you identify a potential discrepancy between an outcome from your analysis and a reported outcome, you need to explictly note the error in your report. Please make sure you are familar with the Error Classification Scheme outlined in the pre-registration document.

Whenever you encounter a numerical discrepancy between a reported value and a value obtained in your analysis, you should use the compareValues() function to classify the type of error. This function takes three arguments: the reported outcome, the outcome obtained in your analysis and “isP”. “isP” should be set to TRUE if you are comparing p-values, otherwise you do not need to specify anything (it will default to FALSE). The function will calculate the percentage error difference between the two values, classify the error type, and return a standardised reporting sentence that you should include in your report.

Here is an example comparing two p values that results in a MINOR NUMERICAL ERROR and a DECISION ERROR:

compareValues(reportedValue = .054, obtainedValue = .049, isP = T)

## [1] "DECISION ERROR and MINOR NUMERICAL ERROR. The reported value (0.054) and the obtained value (0.049) differed by 9.26%. NB obtained value was rounded to 3 decimal places."

Here is an example comparing two other values where there is a MINOR NUMERICAL ERROR:

compareValues(reportedValue = 2.5, obtainedValue = 2.45)

## [1] "MATCH. The reported value (2.5) and the obtained value (2.5) differed by 0%. NB obtained value was rounded to 1 decimal places."

Here is an example comparing two other values where there is a MAJOR NUMERICAL ERROR:

compareValues(reportedValue = 52, obtainedValue = 75)

## [1] "MAJOR NUMERICAL ERROR. The reported value (52) and the obtained value (75) differed by 44.23%. NB obtained value was rounded to 0 decimal places."

Note that there is special, fourth type of error which does not involve comparing numerical values. The INSUFFICIENT INFORMATION ERROR applies to situations where the data analysis procedure reported in the original article (and any supporting documentation) is so unclear or incomplete that you cannot conduct your reproducibility check. Note that if the provided information is ambiguous and you are unsure what the original analysis entailed, you should not attempt to engage in lengthy guess work about what the original authors did.

There is no r function for these situations. You should simply type INSUFFICIENT INFORMATION ERROR in block capitals and then underneath provide commentary in as much detail as possible about what the issue is.

Step 14. Reporting Conclusions

There are two aspects to reporting your conclusions. Firstly, you should provide a verbal summary of the report. Identify and describe any issues you encountered in as much detail as possible.

Secondly, use the custom carpsReport() function included in the CARPSreports package to output a standardised report table and .csv file. The carpsReport() function takes 7 arguments. Firstly, you must specificy the report type which in your case is ‘pilot’. The second argument is the article ID code. The next three arguments are the number of errors of each type that you encountered. “Time_to_Complete” is asking you for an estimate (in minutes) of how long the report took you to complete. This should be in terms of time actually spent on the report, excluding time like coffee breaks etc! The final argument indicates whether author assistance is required - for now set this to FALSE. We may seek author assistance at a later stage of the project.

Manually read through your report, tally up the number of different errors, and enter the numbers into the carpsReport() function.

Here is an example:

carpsReport(Report_Type = 'pilot',
          Article_ID = 'CARPS_1-1-2015_PS', 
          Insufficient_Information_Errors = 1,
          Decision_Errors = 0,
          Major_Numerical_Errors = 1,
          Time_to_Complete = 120, 
          Author_Assistance = FALSE)

Insufficient_Information_Errors	Decision_Errors	Major_Numerical_Errors	Time_to_Complete	Final_Outcome
1	0	1	120	Failure

Notice that the function automatically works out what the final outcome of your report is (success or failure) based on the error types. Consult the pre-registration for more details about how this decision is made.

Step 15. Submitting your report

The final step in preparing your report is to ‘knit’ it. This produces a nice looking html document. You can find the knit button towards the top of the window next to a blue ball of string. When you click ‘knit’, R Studio will show you the html version of your report. Some of the formatting might look a little strange. In which case you should click on ‘open in browser’. Things should look ok now.

If you decide you need to make some changes that’s fine. Just remember to knit your report again right before you submit it so that the html file is up-to-date.

To submit your report, you should issue a pull request. This means you are requesting that the author of the original master repo (Tom) merges the changes you have made in your fork with the master. To issue the pull request, open up the Github Desktop software and select your repo. Make sure you have committed and synced all recent changes first. Now click on the ‘pull request’ button in the top right. In the ‘description’ box, write ‘Pilot reproducibility check is complete’. Then click ‘send pull request’.

That’s it! The piloting stage is over. Psych 251 students your job is done (unless you decide to stick with the project). Other pilots, you will be contacted soon by a co-pilot who will verify your reproducibility check and work with you to try and resolve any issues, potentially through making contact with the original authors.

Step 16. Tips for a top-notch, reproducible report

Here are a few additional tips for producing a top notch reproducible report.

Commentary

Describe exactly what you are doing throughout in plain language interleaved with code chunks. Try to avoid jargon and acronyms where possible (unless they are clearly defined).

Quotations

It can be really useful to use quotations from the original article or associated files to illustrate exactly what the original authors say they did and what they found. To write a quotation in markdown, just use the ‘>’ symbol. For example:

“> This is a quote from the article”

will produce:

This is a quote from the article.

When quoting, make sure you note the source e.g.,

This is a quote from the article. (from Jones et al. p.18).

Tables

We recommend using ‘Kable’ for outputing nicely formatted tables. Kable is included in the knitR package which is loaded at the start of the template.

Images

There are instructions for including images in the R markdown documentation here: http://rmarkdown.rstudio.com/authoring_basics.html

You could, for example, include a screenshot of a figure/table from the original article and compare/contrast it with your own findings.

Reproducibility Checks Step By Step

Tom Hardwicke

Oct 16 2017