Rationale

detachPackages <- function(){
  basic.packages <- c("package:stats","package:graphics","package:grDevices","package:utils","package:datasets","package:methods","package:base")
  package.list <- search()[ifelse(unlist(gregexpr("package:",search()))==1,TRUE,FALSE)]
  package.list <- setdiff(package.list,basic.packages)
  if (length(package.list)>0)  for (package in package.list) detach(package, character.only=TRUE)}

detachPackages()

Reproducibility and replication are two separate but related and important issues in the sciences. When a study is reproducible, running the same code on the same data will produce the same results; when it is replicable, extension to a new dataset or new methods will produce qualitatively similar results. To ensure reproducibility, researchers ought to do a few things. Firstly, they must provide their data and their code. Hopefully, they will also provide a run copy of this code along with all of the outputs in, say, a .pdf or .html format. Secondly, they ought to provide version information, so that future changes in software do not affect the integrity of the results as they were run. This does not preclude software being incorrectly coded, as was the case when SPSS computed \(\eta\) for a number of years. Additionally, researchers ought to provide checksums of their input files. If they do not provide checksums, researchers may utilize different files, notice discrepancies, and unwittingly conclude that research is not reproducible and is thus suspect.

Here is how you can provide checksums for your files. For this, I will use three files: data, repdata, and wrongdata.

Checksums

tocheck = list.files(pattern = "*.csv")
MD5s <- tools::md5sum(tocheck); MD5s

##                           data.csv                        repdata.csv 
## "0c1fe32b08f685a1d0a99616c1697b2e" "0c1fe32b08f685a1d0a99616c1697b2e" 
##                      wrongdata.csv 
## "16b3b314bc20b850b434cf409b17c3ab"

MD5s[c(1:2)] == MD5s[c(2:3)] # 1 (data) == 2 (repdata); 1, 2 (data, repdata) != 3 (wrongdata)

##    data.csv repdata.csv 
##        TRUE       FALSE

It would be appropriate to reanalyze a study that used data.csv with repdata.csv, but potentially not wrongdata.csv. This check only needs done for initial use of data; if you modify the data, the checksum will obviously change.

Version Info

After providing checksums, provide version and environment information.

sessionInfo()

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_4.1.2  magrittr_2.0.1  fastmap_1.1.0   tools_4.1.2    
##  [5] htmltools_0.5.2 yaml_2.2.1      jquerylib_0.1.4 stringi_1.7.6  
##  [9] rmarkdown_2.11  knitr_1.37      stringr_1.4.0   xfun_0.29      
## [13] digest_0.6.29   rlang_0.4.12    evaluate_0.14

Reproducibility Requires Using the Same Data

Rationale

Checksums

Version Info