This is a step-by-step guide for folks running a reproducibility check for the Checking Analytic Reproducibility at Psychological Science (CARPS) project. R code that you may need to run will appear in grey boxes. We strongly recommend you look through the pre-registration for this project before starting out. This will give you a good overview of the general workflow and may contain additional details that have not been included here. Additionally, here is an example of a completed CARPS report to give you an idea of what we are aiming for (you can view the full Github repo here).
You can e-mail me with any questions (tom.hardwicke[@]stanford.edu).
Good luck!
The principle aim of a reproducibility check is to recover a pre-defined subset of target outcomes reported in the original article by repeating the original analysis upon the original data. To do this you should use the provided data files, information about the original analysis provided in the original article, and any other additional documentation (e.g., codebook or analysis scripts). It is not our goal to attempt or suggest alternative analyses that we may think are superior - that is outside the scope of this investigation.
In order to reduce the likelihood of error, we will employ a data co-piloting model in which every reproducibility check involves the input of at least two members of the research team. The ‘pilot’ will make the first attempt to reproduce the target outcomes. The ‘co-pilot’ will then verify the analysis of the pilot. This does not need to be completely independent and the pilot and co-pilot can discuss any issues that arise. Additional members of the research team can be brought into that discussion as necessary.
Ultimately, the pilot and co-pilot should prepare a short report (written in R Markdown) on the outcomes of the reproducibility check and submit to Tom for review. If there are reproducibility problems, we will contact original authors together and attempt to resolve them.
When you are ready to perform a reproducibility check you will be assigned an article by Tom (if you have not been assigned an article and need one then just send an e-mail to tom.hardwicke[@]stanford.edu) . The article will be randomly selected from those available, unless there is good reason to match you up with a particular article (e.g., you’re an expert in the type of analysis used). You will be sent a an id code which you should use to locate the relevant Github repository in this project.
The Github repository for your article should contain:
If you’re into Github, you should fork the repository, do your work, then issue a pull request back to the main repo. If you’re not into Github, you can download the repository, do your work, then e-mail it Tom.
Your goal is to try and recover all of the target outcomes (outlined in targetOutcomes.md) by repeating the analyses as described in the original article. You do not have to read all of the original article, but make sure you are generally familiar with the topic and study design. Also check the whole methods section and, if relevant, supplementary materials, for information about the analysis that was employed. It is common for authors to mention exclusion criteria early in a results section or even in the method section and this may not have been included in the targetOutcomes.md file.
If you are struggling to figure out what the original authors did because of ambiguous or absent information then do not engage in lengthy guesswork. It is ok to try a few different things if that’s not too time-consuming - for example, you might try a Student’s t-test and a Welch t-test if you suspect the authors have just neglected to report which one of these tests they used. However, it is also ok to stop and note an “Insufficient Information Error” (see error classification scheme in the pre-registered protocol). In the conclusion section of your report, write down exactly what the issue is and what information you think you need to know from the original authors. After you have submitted your report to Tom for review, we will use that information as a basis for an e-mail to the original authors requesting additional information or clarifications.
Below is a practical, step-by-step guide for running your reproducibility check.
You must run your reproducibility check in R. You can download the latest version from here:
We highly recommend using the free R Studio software which you can download here:
https://www.rstudio.com/products/RStudio/
We will be using several of R Studio’s built in features, such as R Markdown.
We will be using Github for version control and collaboration. We have our own Github ‘organisation’ set up for this project and you can find all of the reproducibility check repositories (repos) here:
https://github.com/METRICS-CARPS
We also highly recommend using the free Github Desktop software which you can download here:
https://help.github.com/desktop/guides/getting-started/installing-github-desktop/
You can find plenty of guides for using Github online. Here is a good place to start:
We will not be doing anything super fancy so don’t panic if you are not familiar with this tool. If you do not want to use this tool you can switch to a non-Github workflow (exchanging .zip files with Tom via e-mail).
If you are an advanced user, feel free use your own git workflow. However, the rest of this guide assumes you are using the Github Desktop software.
We have put together a simple R package (‘CARPSreports’) that contains a custom R Markdown template and a couple of custom functions for you to use when preparing your reproducibility report. To install the CARPSreports package, you will first need to install another package called devtools:
install.packages("devtools")
Next, run the following command to install CARPSreports directy from our project Github page:
devtools::install_github("METRICS-CARPS/CARPSreports")
To load the package:
library("CARPSreports")
To check that this has installed correctly, click on ‘file,’ ‘new file’, and then ‘R markdown…’ in R Studio. Select ‘from template’ and you should see an option “CARPS Reproducibility Report”. If not, the installation has gone wrong somewhere. Otherwise, you’re good to go!
Each article has been assigned a unique code which looks something like this:
CARPS_1-1-2015_PS
On the CARPS Github page you will see that there is a repo for each article. Each repo contains the original article pdf, a targetOutcomes.md file, and a data folder containing a data file and sometimes some other files (more on this below). The repo also comes with a .gitignore file so you do not have to set this up yourself.
Once you have your article ID code, you need to fork the corresponding repository. You can do this by opening the repo on the Github website and clicking on the ‘fork’ button in the top-right. In may take a few minutes for the repo to be forked over to your account. When it has finished you need to clone the repository to your personal computer. The quickest way to do this is to click the green ‘clone or download’ button and then click ‘Open in Desktop’. Make sure you are in the forked repo and not the original master branch! The files will now be downloaded to your computer and you should see the repo in the Github Desktop software.
Open up R Studio. Click ‘file’, ‘new project’, ‘existing directory’ and then browse to whereever you cloned the repo to on your computer. Now click ‘create project’. If the R Studio project is set up correctly then you should see the files from your repo listed in the files section.
It is up to you how much you use the Github version control features. It is good practice to ‘commit’ your changes fairly regularly. Each time you commit changes they are saved, and you can ‘roll back’ if you realise later on that you’ve made a mistake.
To commit and sync after you’ve made some changes, open up Github Desktop and select your repo. Where it says ‘summary’ and ‘description’ you can enter some information about this commit so you can work out what you did later on. For example, if you’ve just created the .Rmd document for your reproducibility report, you might call this ‘created report’ and in the description put something like ‘created .Rmd file for reproducibility report’ (just a summary is often sufficient). Now click on ‘commit to master’. Note that at this point you have just commit to your fork ‘master’ on your computer - the changes are just saved locally. Its a good idea to now click on ‘sync’ in the top right, which will back up your changes on the Github website.
You can see a useful graphical representation of the original CARPS master and your fork master in the dark grey box. You are not making any changes to the original CARPS master right now, just your fork. But eventually we are going to connect these back up.
Ok you’re almost ready to start with the actual reproducibility check!
If you are the pilot you will start by opening up a new R Markdown file. If you are the co-pilot the easiest approach is to duplicate the pilot’s report (call the new file ‘finalReport.Rmd’) and make changes to that.
R Markdown enables a coding/analysis approach called ‘literate programming’. This is the idea that we interleave actual code with plain language commentary explaining what we are doing in sufficient detail such that someone who does not understand the code itself can still figure out what we have done (including our future selves). This is a key component of reproducible analysis, and I hope we can exemplify this best practice in our own analysis for CARPS. You should aim to provide detailed commentary in plain text throughout your report.
If you are unfamiliar with R Markdown, there’s plenty of information available here:
http://rmarkdown.rstudio.com/lesson-1.html
You may also find this ‘cheatsheet’ useful:
https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf
To run code that you have entered in ‘chunks’ just click the green arrow.
I have put together a custom R Markdown template so we can keep the CARPS reproducibility reports in a fairly standardised format. To open the template, click on ‘file’, ‘new file’, and then ‘R markdown…’ in R Studio. Select ‘from template’ and you should see an option “CARPS Reproducibility Report” if the CARPSreports package installed correctly (see above). Click on OK.
The R Markdown file that opens will begin with a ‘yaml’ header between two sets of dashed lines. Leave this section as it is.
Below that you’ll see a code block referring to various details about this reproducibility check e.g., articleID, reportType, pilotNames etc.
Please enter the relevant details if they are not there already (i.e., when you are a copilot). For example, enter the article ID which might be something like “1-1-2015_PS” (you can drop the “CARPS_” part). Note that we are going to try and get a reasonable estimate of how long these reports take us, so keep your eye on the clock whenever you work on the report. It doesn’t have to be spot on - we don’t want to disrupt people’s workflows by having them time everything with a stopwatch. Just figure it out approximately. If multiple people are working on a report then you should keep adding the time spent to the relevant counter each time you submit a pull request.
Co-pilots - in finalReport.Rmd you must change the report type from ‘pilot’ to final’
Throughout the template I included some guidance text in square brackets that you should either replace or delete before submitting your final report. Anything not in square brackets should remain in your report.
Save the R Markdown file with the name “pilotReport.Rmd” or “finalReport.Rmd” (if you are a co-pilot)
Before we get into the details of the R Markdown template, let’s go and have a look at what is available in the repo. You should have a pdf of the article, a targetOutcomes.md file (.md stands for ‘markdown’), and a data folder containing a data file or files. If any of these are missing contact Tom.
The targetOutcomes.md file can be opened in any text editor, or you can view it in the repo on Github. It outlines exactly which outcomes in the paper you are to try and reproduce.
Please note you will likely need more information than is included in the targetOutcomes.md file in order to run your reproducibility check. For example, there may be essential pre-processing steps that are detailed in the article, but are not included in the targetOutcomes.md file.
You should read relevant parts of the article and develop a good understanding of the methods employed by the original authors. Make sure you download any supplementary information files to see if they contain additional important details. You may even find some analysis scripts. This is great news of course because that is very concrete information that should help you run your reproducibility check. If you do find detailed information about the analysis used by the original authors, be sure to include quotations in your report to illustrate this (see below for details how).
Please note You must not directly edit the original data files This cannot be emphasised enough! The original data file must remain as it was when you forked the repo. This is so that others who work on the project can reproduce everything you have done from scratch. If you need to make manual edits to a data file, you should save an additional file (see below for details). If you accidentally make changes to the original data file, then you should roll back these changes using Github (this is why it is important to regularly commit changes!).
You will need to fill in the Methods summary and Target outcomes section. You need to write the methods summary from scratch, but you can copy and paste the target outcomes from the targetOutcomes.md file. The methods summary only needs to be brief but capture all the important details that relate to the target outcomes. If you’re the copilot, check the pilot has done this correctly.
The remainder of the report is divided into 5 key stages outlined below.
Load any necessary R packages. Some useful ones are already listed and you can add any additional ones that you need. Its helpful to add a comment saying what the package is for.
Load data from the file or files in the data folder. You may need different functions for different types of file.
This cheatsheet may be helpful: https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data-import-cheatsheet.pdf
Mung/wrangle (organise) the data into a format that facilitates subsequent analysis. We highly recommend learning the concept of ‘tidy data’. For resources see here: http://r4ds.had.co.nz/tidy-data.html
and here: https://www.jstatsoft.org/article/view/v059i10
This cheatsheet may also be helpful: https://github.com/rstudio/cheatsheets/raw/master/source/pdfs/data-transformation-cheatsheet.pdf
To the greatest possible extent you should try and conduct data munging operations programmatically in R. In some cases, you may need to make manual adjustments to the data file in, for example, Excel. If you have to do this, you should detail the steps you have taken in your R Markdown report, and save an additional data file with the name “data_manualClean”. DO NOT EDIT THE ORIGINAL DATA FILE.
This section is further sub-divided into pre-processing, descriptive statistics, and inferential statistics. Work systematically through the target outcomes, attempting to reproduce each reported outcome with the analyses described in the original article (and any supporting documents). Make sure you write down exactly which target outcome you are trying to reproduce (ideally quoting verbatim from the original article) before the analysis code so it is easy to compare to the output. If the pre-processing, descriptive statistics, and inferential statistics headings are not helpful, feel free to drop them.
For every single value in the target outcomes you must run the reproCheck() function to explicity compare the values. This includes values like degrees of freedom or sample sizes.
The reproCheck() function is part of the CARPSreports package built specially for this project. As input, it takes the “reportedValue” (i.e., a target value from the original article) and the “obtainedValue” (i.e., the value obtained in your analysis), compares the two and calculates the percentage error between them. The function automatically works out if an error has occurred. We need to use this for every single value because we need a complete record of every value we have checked, even if it is a match.
You also need to specify the “valueType”, for example whether you are check a mean, standard deviation, t value etc. There is a list of pre-specified values which you can find by running ?reproCheck. If none of these options seem to fit then just use “other”. If you are checking a p value and enter ‘p’ as the valueType, the function will automatically check to see if there is a Decision Error (see the pre-registered protocol if you are not sure what this means).
There is another parameter called “eyeballCheck” which sounds weird. Normally you won’t need this and by default it is set to NA so you can ignore it. However, there are occasions where the original report does not contain an exact value and instead reports a relationship to a threshold, for example p <.05 or t < 1. This is when we need to “eyeball” the comparison instead i.e., manually check that the obtainedValue falls in the correct interval indicated by the reported value. If you need to do this, then you should enter eyeballCheck = TRUE if the values seem to match OK. If the value does not fall within the correct interval, then you should run a regular reproCheck() using threshold as the reported value. For example, if the authors report p <.01, and you obtain p = .26, you should run reproCheck(reportedValue = “.01”, obtainedValue = .26, valueType = ‘p’) which will record a Decision Error in this case. This can sometimes get a bit complicated so please check with Tom to discuss any issues or edge cases.
The reproCheck() function outputs two things. Firstly, a short sentence is printed telling you the outcome of the comparison - specifically the values compared, the percentage error, and whether they match or whether there was an error. Secondly, the function stores its output in something called the “reportObject”. A blank reportObject is created at the start of your report - it is already included in the template and looks like this:
# Prepare report object. This will be updated automatically by the reproCheck function each time values are compared
reportObject <- data.frame(dummyRow = TRUE, reportedValue = NA, obtainedValue = NA, valueType = NA, percentageError = NA, comparisonOutcome = NA, eyeballCheck = NA)
Each time you run the reproCheck() function, you must assign the output to the reportObject so it can be updated (example below). By the end of the report, all of the comparisons are stored and written out as a .csv file.
Note that when you enter the obtainedValue, you should use the R variable containing the value where possible, rather than writing out the value manually. This helps to avoid typos. The function will also automatically round the obtainedValue to the same number of decimal places as the reportedValue.
When you enter the reportedValue, it must be entered as a character e.g., ‘21’. Don’t worry if you accidentally enter it as a number, 21; the function will tell you off and ask you to do it properly. It is important not to get the obtainedValue and reportedValue mixed up so I suggest you write out the argument name in full, as shown below.
An example where we check a mean and there’s a major error:
condition_mean <- mean(c(1,2,3,4))
reportObject <- reproCheck(reportedValue = '3.45', obtainedValue = condition_mean, valueType = 'mean')
An example where we check a standard deviation and its a match:
this_sd <- 15.63
reportObject <- reproCheck(reportedValue = '15.63', obtainedValue = this_sd, valueType = 'sd')
An example where we check a t value and there is only a minor error (because the percentage error is below 10%):
this_t <- 1.2
reportObject <- reproCheck(reportedValue = '1.3', obtainedValue = this_t, valueType = 't')
Here is an example where there is a decision error for the p-value:
a_p_value <- 0.048
reportObject <- reproCheck(reportedValue = '.054', obtainedValue = a_p_value, valueType = 'p')
An example where the p-value is reported as “p <.05” so we have to do an eyeball check. In this case we can see that the description <.05 is accurate, so we say eyeballCheck = TRUE.
a_significant_p_value <- .012
reportObject <- reproCheck(reportedValue = '<.05', obtainedValue = a_significant_p_value, valueType = 'p', eyeballCheck = TRUE)
Another example where the p-value is reported as “p <.05” so we have to do an eyeball check. But in this case we can see that the description <.05 is NOT accurate, so we say eyeballCheck = FALSE.
a_not_significant_p_value <- .24
reportObject <- reproCheck(reportedValue = '<.05', obtainedValue = a_not_significant_p_value, valueType = 'p', eyeballCheck = FALSE)
Note that there is special, fourth type of error which does not involve comparing numerical values. The INSUFFICIENT INFORMATION ERROR applies to situations where the data analysis procedure reported in the original article (and any supporting documentation) is so unclear or incomplete that you cannot conduct your reproducibility check (or some aspect of it). Note that if the provided information is ambiguous and you are unsure what the original analysis entailed, you should not attempt to engage in lengthy guess work about what the original authors did.
There is no R function for these situations. You should simply type INSUFFICIENT INFORMATION ERROR in block capitals and then underneath provide commentary in as much detail as possible about what the issue is. In the conclusion part of the report you should tally up the number of these errors and update the Insufficient_Information_Errors variable accordingly.
There are a few aspects to reporting your conclusions. Firstly, you should provide a verbal summary of the report. Identify and describe any issues you encountered in as much detail as possible.
Secondly, fill in the code chunk in the conclusion section with revelant information. You can wait to do this until the very end of the reproducibility check (i.e., after author assistance). If author assistance was provided, change the Author_Assistance variable to TRUE. If the reproducibility check was a success then you can ignore the remaining variables. However, if you encountered at least one Major Error or Decision Error, then the reproducibility check was a failure and you should add the revelant details. Firstly, add information about the potential cause of the reproducibility issues you encountered to the locus_ variables. Then specify TRUE or FALSE whether you think the original conclusions may be seriously affected by the reproducibility issues you encountered.
Because there is quite a bit of subjective involvement here, feel free to discuss the issues with other members of the team.
The remaining code chunks automatically collate information from across the report and output two .csv files.
The final step in preparing your report is to ‘knit’ it. This produces a nice looking html document. You can find the knit button towards the top of the window next to a blue ball of string. When you click ‘knit’, R Studio will show you the html version of your report. Some of the formatting might look a little strange. In which case you should click on ‘open in browser’. Things should look ok now.
If you decide you need to make some changes that’s fine. Just remember to knit your report again right before you submit it so that the html file is up-to-date.
To submit your report, you should issue a pull request. This means you are requesting that the author of the original master repo (Tom) merges the changes you have made in your fork with the master. To issue the pull request, open up the Github Desktop software and select your repo. Make sure you have committed and synced all recent changes first. Now click on the ‘pull request’ button in the top right. In the ‘description’ box, write something like ‘Pilot reproducibility check is complete’. Then click ‘send pull request’.
That’s it for now. It you are the pilot, then a co-pilot will be assigned to double check your report. If there are reproducibility issues that need resolving then we may need to contact the original authors and Tom will be in touch about that.
Here are a few additional tips for producing a top notch reproducible report.
Describe exactly what you are doing throughout in plain language interleaved with code chunks. Try to avoid jargon and acronyms where possible (unless they are clearly defined).
It can be really useful to use quotations from the original article or associated files to illustrate exactly what the original authors say they did and what they found. To write a quotation in markdown, just use the ‘>’ symbol. For example:
“> This is a quote from the article”
will produce:
This is a quote from the article.
When quoting, make sure you note the source e.g.,
This is a quote from the article. (from Jones et al. p.18).
There are instructions for including images in the R markdown documentation here: http://rmarkdown.rstudio.com/authoring_basics.html
You could, for example, include a screenshot of a figure/table from the original article and compare/contrast it with your own findings.