1 What statcheck Can and Cannot Do

This manual provides detailed instructions for the installation and use of the free R package “statcheck: Extract statistics and recompute p-values” (Epskamp & Nuijten, 2018). For a more concise manual check out the R Help files of statcheck (?statcheck).

Before I get to the technical parts of installing and using statcheck, please consider for a moment what statcheck can and cannot do. The package statcheck is a program that automatically extracts statistics from articles and recomputes their p-values. It works as follows:

1. Convert PDF and HTML files to plain text.

PDF files are converted using the program Xpdf, HTML files are read in by R and automatically converted to raw text files.

2. Scan text for statistical results

statcheck searches for specific patterns and recognizes statistical results from correlations and t, F, \(\chi^2\), Z tests and Q tests. statcheck can only read these results if the results are reported exactly according to the APA guidelines:

  • t(df) = value, p = value
  • F(df1, df2) = value, p = value
  • r(df) = value, p = value
  • \(\chi^2\) (df, N = value) = value, p = value (N is optional, \(\Delta\)G is also included, since it follows a \(\chi^2\) distribution)
  • Z = value, p = value
  • Q (df) = value, p = value (statcheck can read and distinguishes between Q, Qw / Q-within, and Qb / Q-between)

All regular expressions take into account that test statistics and p values may be exactly (=) or inexactly (< or >) reported. Different spacing has also been taken into account, and case is ignored.

3. Use test statistics and degrees of freedom to recompute p value

By default the recomputed p value is two sided.

4. Compare reported and recomputed p value

This comparison takes into account how the results were reported, e.g. p < .05 is treated differently than p = .05. Incongruent p values are marked as an “Error”. If the reported result is significant and the recomputed result is not, or vice versa, the result is marked as a “Decision Error”.

Correct rounding is taken into account. For instance, a reported t value of 2.35 could correspond to an actual value of 2.345 to 2.354 with a range of p values that can slightly deviate from the recomputed p value. statcheck will not count cases like this as errors.

Furthermore, one-sided testing is taken into account: if somewhere in the article the words “one-tailed”, “one-sided”, or “directional” are mentioned, and the result would have been correct if it was one-sided, it is counted as a correctly reported one-sided test. However, as Felix Thoemmes pointed out to me, this only works if the one-tailed result was in line with the one-tailed hypothesis. If you expected a negative effect, but found a positive one, your p-value would be 1-p/2. Unfortunately statcheck can’t recognize this and will wrongly flag this as an inconsistency.

For a detailed validity study of statcheck, see Nuijten, Hartgerink, Van Assen, Epskamp, & Wicherts, 2016 and Nuijten, Hartgerink, Van Assen, Epskamp, & Wicherts, 2017.

Note that statcheck assumes that the p-value is the inconsistent value, but it could just as well be the case that the test statistic or degrees of freedom contain a reporting error. statcheck merely detects wether a set of numbers is consistent with each other.

Also note that corrected statistical results can also cause statcheck to flag them as inconsistent. For instance, if you want to perform a Bonferroni correction for multiple testing and multiply your p-value instead of dividing your \(\alpha\), statcheck will flag your result as inconsistent. This also holds for other corrections in which the test statistic, degrees of freedom, or p-value are adjusted (such as e.g., Greenhouse-Geisser). Note that when using these corrections it is advised to still report the result in a consistent manner (so don’t report the uncorrected degrees of freedom with the corrected test statistic and p-value). See Nuijten, et al., 2017 for details on this issue.

Finally, as Nick Brown pointed out to me, note that some cases are not flagged as an inconsistency, whereas they might look suspicious to a human reader. For instance, take the following case:

We found a significant difference, t(99) = 1.95, p = .05.

Here, the recalculated p-value is actually .054, which contradicts the claim that this finding is significant. However, statcheck will not flag this result as an inconsistency, since, well, it’s not. If you round “.054” to two decimals, it actually is .05. This is an example of a case in which a human might raise his/her eyebrows, but statcheck strictly adheres to the math: .054 is correctly rounded to .05, and hence not inconsistent.

2 Installation

There are several programs you need to install before you can start using statcheck. First you need to install R, and preferably the R environment RStudio. Furthermore, you will need the program Xpdf to enable statcheck to convert PDF articles to plain text files. In the last step you can load the package statcheck in R. The next sections explain where and how to download and install these programs.

2.1 R and RStudio

To use statcheck you first need to install R. R is a free programming language and environment for statistical computing and graphics. You can download it from https://cran.r-project.org/. If you want you can now run R via Rstudio. Rstudio is an interface for R and it has several nice tricks that can make programming in R easier (syntax highlighting, more point-and-click options, etc.). You can obtain the latest version of Rstudio from http://rstudio.com.

2.2 Xpdf

statcheck relies on the program Xpdf that converts PDF files to plain text files. To download and install Xpdf, follow the steps for Windows or Mac below.

2.2.1 Xpdf for Windows

Many thanks to Daniel Lakens who created the step by step installation manual for Xpdf.

1. Download Xpdf and unzip the contents (especially bin32 and bin 64) of the Xpdf to a folder: http://www.foolabs.com/xpdf/download.html.

2. Close R and RStudio

3. Right click the ‘This PC’ icon in windows explorer:

4. Click ‘properties’

5. Click ‘Advanced system settings’

6. Click ‘Environment Variables’

7. Select ‘Path’ and click ‘Edit’

8. Add the path to the folder containing the Xpdf folder, and specifically the folder bin32 (for a x86 system) or bin64 (for a 64 bit system). Add the path after a ; seperator.

2.2.2 Xpdf on Mac

If you want to run statcheck on a Mac, follow these steps:

Step 1. Download XQuartz. XQuartz is an open source version of the X.Org X Window System that runs on OS X. You can download it at http://xquartz.macosforge.org/landing/.

Step 2. Download the binaries for Xpdf and unzip them (see step 1 in the instructions above).

Step 3. Add the location of the Xpdf binaries to your path. For a step-by-step instruction how to add something to the path on Mac, see this website.

If these three steps don’t work (they seem to create some problems on MacOS 10.11.5), you can try the following other steps:

# Install homebrew: open terminal and paste 
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

# Install xquartz: open terminal and paste 
brew install xquartz --cask

# Install xpdf: open terminal and paste
brew install xpdf

Thanks go to Eiko Fried for figuring this out for me, and Lukas Wallrich for pointing out an update to the syntax.

2.3 statcheck from CRAN

After you installed Xpdf, you can load the R package statcheck with the following lines of code:

install.packages("statcheck")
library("statcheck")

statcheck is now ready to use!

2.4 Optional: statcheck from GitHub

statcheck is constantly being updated, which means that the version on CRAN is not always the latest version. Instead of downloading statcheck from CRAN, you can also download the latest version of statcheck directly from GitHub with the following lines of R code:

devtools::install_github("MicheleNuijten/statcheck")
library("statcheck")

Note that you still need to have Xpdf installed before you can download and run statcheck.

3 Using statcheck

3.1 Using statcheck on a raw string of text

The most basic function of statcheck is to extract statistics from a raw string of text, recalculate the p-value, and report back the findings. In the example below I simply feed statcheck a string of text with some statistics in it, and it gives me back a table with the results. I’ll go over the results below.

txt <- "blablabla the effect was very significant (t(48) = 1.02, p < .05)"
statcheck(txt)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         t  NA  48               =  1.02                   <
##   Reported.P.Value  Computed                   Raw Error DecisionError OneTail
## 1             0.05 0.3128421 t(48) = 1.02, p < .05  TRUE          TRUE   FALSE
##   OneTailedInTxt APAfactor
## 1          FALSE         1

Note that when you want to scan a Z test, you need to place a space in front of Z. This is because of an ugly hack I had to build in: sometimes Z tests were read as \(\chi^2\) and this was the easiest way to avoid that:

statcheck("z = 1.95, p = .05")

# instead, type

statcheck(" z = 1.95, p = .05")

The output table is quite large, and because of the screen size it often gets divided over several lines (as is the case now). This can make the output a bit hard to read. I pasted an unfolded table below to give you an idea what it looks like.

I’ll go over the different parts of the output below.

3.1.1 statcheck output: part 1

The first part of the output (as shown below) shows which statistics are extracted from the input.

txt <- "blablabla the effect was very significant (F(2, 65) = 3.02, p < .05)"
stat <- statcheck(txt)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
stat[ ,1:8]
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         F   2  65               =  3.02                   <
##   Reported.P.Value
## 1             0.05

Source indicates from which “Source” the statistic on this particular line was extracted. When you feed statcheck raw strings of text (as I did here), it numbers the objects in the input vector. Here I only fed statcheck one string, so Source is 1. If you feed statcheck an article, Source shows the file name of the article.

The next columns show the full result that was extracted, split up in the different elements. Splitting up the different elements that were extracted can be useful if you for instance want to investigate how many F-values an article or set of articles contains, or if you want to investigate the distribution of extracted p-values.

3.1.2 statcheck output: part 2

The second part of the output, as shown below, shows the result of the consistency check.

stat[ ,9:12]
##     Computed                      Raw Error DecisionError
## 1 0.05569781 F(2, 65) = 3.02, p < .05  TRUE          TRUE

The column Computed shows the computed p-value that statcheck calculated based on the reported test statistic and degrees of freedom. It depends on the argument OneTailedTests whether this is a two-sided (default; OneTailedTests=FALSE) or one-sided p-value (OneTailedTests=FALSE).

In the column Raw you find the complete result as extracted by statcheck. This is the concatenated version of the information in the previous columns.

The columns Error and DecisionError show the final results of the consistency check. In this example, Error is TRUE, which means that the reported p-value is inconsistent with the reported test statistic and degrees of freedom. You can also see this in the output, where the reported p-value is “< .05”, whereas the computed p-value is .0557. This also means that DecisionError is also TRUE, which indicates not only that the reported p-value is inconsistent, but it also changes the statistical conclusion: the reported p-value is significant, whereas the recomputed one is not.Note that a result can only be a DecisionError when it is also an Error.

3.1.3 statcheck output: part 3

The last part of the output shows some extra information about the extracted result and the article it was extracted from.

##   OneTail OneTailedInTxt APAfactor
## 1    TRUE          FALSE         1

The column OneTail indicates whether a result could have been a one-tailed test. This means that the result would have been correct if it is a one-tailed test.

The column OneTailedInTxt indicates whether the string “one-tailed”, “one-sided”, or “directional” was found in the full text that statcheck scanned. If OneTailedInTxt==TRUE, it could be a good indication that some or all of the tests are one-tailed, which means that some of the flagged inconsistencies might not be inconsistent at all.

Finally, the column APAfactor indicates the proportion of all detected p-values that were part of a fully APA reported NHST result. It gives a rought indication of how many statistical tests statcheck may have missed because of reporting issues.

3.1.4 statcheck output: messages

In some cases, statcheck will evaluate a string of text and on top of the output it will also give back one or two messages:

Extracting statistics...
  |==================================================================| 100%

 Check the significance level. 
 
 Some of the p value incongruencies are decision errors if the significance level is .1 or .01 instead of the conventional .05. It is recommended to check the actual significance level in the paper or text. Check if the reported p values are a decision error at a different significance level by running statcheck again with 'alpha' set to .1 and/or .01.

Check for one tailed tests. 
 
 Some of the p value incongruencies might in fact be one tailed tests. It is recommended to check this in the actual paper or text. Check if the p values would also be incongruent if the test is indeed one sided by running statcheck again with 'OneTailedTests' set to TRUE. To see which Sources probably contain a one tailed test, try unique(x$Source[x$OneTail]) (where x is the statcheck output). 

In the first message, “Check the significance level”, statcheck warns you that even though it assumed an \(\alpha\) of .05, there might be DecisionErrors in the text you scanned if \(\alpha = .1\) or \(\alpha = .01\). It is advised to check the full text of the article to look up the significance level.

In the second message, “Check for one tailed tests”, statcheck warns you that some of the flagged inconsistencies could be caused by the use of one-tailed tests. It is advised to check the full text of the article to see if a result is actually consistent because a one tailed test was used.

3.1.5 statcheck output: no results found

It is of course possible that the text you want to check doesn’t contain any (correctly reported) APA results. In that case, statcheck will give the following notification:

txt <- "All the results were significant, all ps < .05"
statcheck(txt)
## Extracting statistics...
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## statcheck did not find any results

3.2 Using statcheck on a PDF article

One of the main advantages of statcheck is that it can automatically extract statistical results from articles, so that you don’t have to enter them manually. Instead of the statcheck function, you use the checkPDF function, which automatically calls statcheck.

There are three ways to select an article using checkPDF.

The first way is to specify the entire path of the article you want to scan:

checkPDF("C:/Dropbox/Science/Papers/my_article.pdf")

The second way is to specify the location of the article in your working directory and then run checkPDF:

setwd("C:/Dropbox/Science/Papers/")
checkPDF("my_article.pdf")

The third and last way is not specifying anything:

checkPDF()

If you don’t specify a file, you will get a point-and-click pop-up in which you can select the article you want to scan.

Note that there can be type setting issues in PDF articles. To be able to read PDF, statcheck uses the program Xpdf to convert a text in PDF into plain text. However, sometimes mathematical symbols such as “<” or “=” are switched by images in the text, which Xpdf cannot translate into plain text. If that is the case, statcheck will miss these results. I would therefore advise to use HTML articles when possible.

3.3 Using statcheck on an HTML article

You can use statcheck on an HTML article by using the function checkHTML. This function works in the same way as checkPDF (see the section above).

If you have access to both the PDF and the HTML version of an article, I would advise to check the HTML article to avoid typesetting issues.

3.4 Using statcheck on a folder of articles

It is also possible to automatically let statcheck check an entire folder of articles with the functions checkPDFdir (scans all the PDF articles in the specified folder), checkHTMLdir (scans all the HTML files in the specified folder), or simply checkdir (scans both PDF and HTML files in a folder).

Specifying the folder/directory works similarly as specifying a file in the checkPDF or checkHTML function (see sections above):

checkdir("C:/Dropbox/Science/Papers/")
# or
checkdir()

All three functions to check an entire directory have an argument with which you can specify if you also want to check all subdirectories within the directory you specified: subdir=TRUE. By default, these functions will check everything in a directory, also the subdirectories. If you don’t want to check subdirectories, you can set this argument to false.

# checks subdirectories as well
checkdir("C:/Dropbox/Science/Papers/")
# does not check the subdirectories
checkdir("C:/Dropbox/Science/Papers/", subdir = FALSE)

Lastly, there is an extra argument in checkHTMLdir, namely extension = TRUE. This argument controls whether statcheck will search the specified directories for files with the extension “.htm” or “.html”, or if it will just check everything. This argument is mainly useful when you hide file extensions in your Explorer or Finder.

3.5 Choosing the right arguments

The statcheck function requires several arguments, most of which have a default value. The arguments control aspects such as: the assumed alpha level, how one-tailed tests are treated, and which types of statistics should be checked. These are all statcheck’s arguments with their default values:

statcheck(x, stat = c("t", "F", "cor", "chisq", "Z"), 

    OneTailedTests = FALSE, alpha = 0.05, pEqualAlphaSig = TRUE, 

    OneTailedTxt = FALSE, AllPValues = FALSE)

All these arguments can also be specified in the functions to check entire articles or folders of articles. E.g., when using checkPDF, you can also specify specific arguments as follows:

checkPDF("my_article.pdf", OneTailedTxt = TRUE, alpha = .01)

I will go over all arguments below.

3.5.1 x

The argument x defines the main input variable: a string of text from which you’d like to extract and check the statistics. This is the only argument without a default value.

txt <- "blablabla the effect was very significant (t(100)=1, p < 0.001)"
statcheck(txt)

Or directly:

statcheck("blablabla the effect was very significant (t(100)=1, p < 0.001)")

It is also possible to evaluate a vector of strings:

txt1 <- "t(100) = 1, p < 0.001"
txt2 <- "F(2,45) = 2.81, p = .45"
statcheck(c(txt1, txt2))
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         t  NA 100               =  1.00                   <
## 2      2         F   2  45               =  2.81                   =
##   Reported.P.Value   Computed                     Raw Error DecisionError
## 1            0.001 0.31972416   t(100) = 1, p < 0.001  TRUE          TRUE
## 2            0.450 0.07080002 F(2,45) = 2.81, p = .45  TRUE         FALSE
##   OneTail OneTailedInTxt APAfactor
## 1   FALSE          FALSE         1
## 2   FALSE          FALSE         1

3.5.2 stat

The argument stat defines which type of statistics will be extracted. By default statcheck extracts all the statistics it can read: t, F, correlations, \(\chi^2\), and Z. With stat you can tell statcheck to only extract a selection of those statistics or even just one type, e.g.:

txt <- "blablabla result 1 is t(100) = 1, p < 0.001, and result 2 is F(2,45) = 2.81, p = .45"
statcheck(txt, stat = "t")
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         t  NA 100               =     1                   <
##   Reported.P.Value  Computed                   Raw Error DecisionError OneTail
## 1            0.001 0.3197242 t(100) = 1, p < 0.001  TRUE          TRUE   FALSE
##   OneTailedInTxt APAfactor
## 1          FALSE       0.5

Or when you’re specifically looking for only t statistics and correlations:

txt <- "blablabla result 1 is t(100) = 1, p < 0.001, and result 2 is F(2,45) = 2.81, p = .45"
statcheck(txt, stat = c("t", "cor"))
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         t  NA 100               =     1                   <
##   Reported.P.Value  Computed                   Raw Error DecisionError OneTail
## 1            0.001 0.3197242 t(100) = 1, p < 0.001  TRUE          TRUE   FALSE
##   OneTailedInTxt APAfactor
## 1          FALSE       0.5

By default, statcheck extracts all statistics it can read.

3.5.3 OneTailedTests

The argument OneTailedTests defines whether we treat all extracted tests as two-tailed (OneTailedTests=FALSE; default) or as one-tailed (OneTailedTests=TRUE):

txt <- "F(2,45) = 2.81, p = .45"
statcheck(txt, OneTailedTests = TRUE)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         F   2  45               =  2.81                   =
##   Reported.P.Value   Computed                     Raw Error DecisionError
## 1             0.45 0.03540001 F(2,45) = 2.81, p = .45  TRUE          TRUE
##   OneTail OneTailedInTxt APAfactor
## 1   FALSE          FALSE         1

In the chunk above, the extracted test will be treated as a one-tailed test. Note that the computed p-value in the output now is divided by two.

This argument is not very “smart”. It will consider ALL extracted tests as one-tailed, even when that is impossible (e.g., when the df1 of an F test is > 2).

3.5.4 alpha

The argument alpha defines which level of significance statcheck will assume. This argument defaults to \(\alpha\) = .05, the widely used criterium in psychology.

If a result isn’t a DecisionError with \(\alpha\) = .05, but it would be when \(\alpha\) = .01 or \(\alpha\) = .10, statcheck will print a message advising you to look up the actual level of significance that is used in the article. If it turns out that an \(\alpha\) of .01 is used, you can adapt that in statcheck:

txt <- "F(2,45) = 5.81, p = .03"

# default level of significance = .05
statcheck(txt)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         F   2  45               =  5.81                   =
##   Reported.P.Value    Computed                     Raw Error DecisionError
## 1             0.03 0.005694552 F(2,45) = 5.81, p = .03  TRUE         FALSE
##   OneTail OneTailedInTxt APAfactor
## 1   FALSE          FALSE         1
# change level of significance
statcheck(txt, alpha = .01)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         F   2  45               =  5.81                   =
##   Reported.P.Value    Computed                     Raw Error DecisionError
## 1             0.03 0.005694552 F(2,45) = 5.81, p = .03  TRUE          TRUE
##   OneTail OneTailedInTxt APAfactor
## 1   FALSE          FALSE         1

Note that when \(\alpha\)=.01 this result is counted as a DecisionError.

3.5.5 pEqualAlphaSig

The argument pEqualAlphaSig defines whether a p-value equal to \(\alpha\) is treated as significant (pEqualAlphaSig=TRUE; default) or not (pEqualAlphaSig=FALSE).

By convention, a result is deemed significant when p < \(\alpha\), which would correspond with the setting pEqualAlphaSig=FALSE. However, when we inspected all instances of published papers in which statcheck found “p = .05”, we found that in almost 95% of the cases the authors had interpreted the result as significant (Nuijten et al., in press). Because of this finding, we decided to count p =< \(\alpha\) as significant by default.

This example illustrates the difference between the two options:

txt <- "F(2,45) = 4.10, p = .05"

# default pEqualAlphaSig=TRUE
statcheck(txt)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         F   2  45               =   4.1                   =
##   Reported.P.Value   Computed                     Raw Error DecisionError
## 1             0.05 0.02313502 F(2,45) = 4.10, p = .05  TRUE         FALSE
##   OneTail OneTailedInTxt APAfactor
## 1   FALSE          FALSE         1
# change level of significance
statcheck(txt, pEqualAlphaSig = FALSE)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         F   2  45               =   4.1                   =
##   Reported.P.Value   Computed                     Raw Error DecisionError
## 1             0.05 0.02313502 F(2,45) = 4.10, p = .05  TRUE          TRUE
##   OneTail OneTailedInTxt APAfactor
## 1   FALSE          FALSE         1

Note that in the default setting this result was not counted as a DecisionError. After all, the reported “p = .05” is counted as significant, and even though the computed p-value (p = .02) is not consistent with p = .05, it is still significant.

In the version where pEqualAlphaSig=FALSE, this result is counted as a DecisionError, because here the reported result is not significant (only p < .05 is counted as significant, not p = .05), whereas the computed result is.

3.5.6 pZeroError

The argument pZeroError controls whether statcheck will count a p-value reported as “p=.000” as an error (default, pZeroError=TRUE) or not (pZeroError=FALSE).

We decided to count these cases as an error, because according to the APA reporting guidelines, a p-value smaller than .001 should be reported as “p<.001”, and never as “p=.000”. Furthermore, a p-value can never be exactly zero, so it makes no sense to report it as such.

If you find these criteria too stringent, you can set pZeroError to FALSE.

3.5.7 OneTailedTxt

The argument OneTailedTxt controls whether statcheck will try to identify and correct for one-tailed tests. This is one of the most important arguments in statcheck. Our validity study showed that statcheck’s accuracy improved when OneTailedTxt = TRUE (Nuijten et al., in press).

By default, OneTailedTxt = FALSE and statcheck will treat all tests it finds the same, depending on your settings for OneTailedTests. However, you can tell statcheck to actively look for tests that might be one-tailed by setting OneTailedTxt = TRUE.

When OneTailedTxt = TRUE, statcheck will search the entire text for the keywords “one-tailed”, “one-sided”, and “directional” (taking spacing issues etc. into account). When statcheck finds at least one of those keywords AND an initially inconsistent result would be consistent if it was a one-tailed test, then statcheck treats this case as a one-tailed test and counts it as consistent.

txt <- "... t(48) = 1.82, p < .05, all tests were one-tailed."

# default OneTailedTxt=FALSE
statcheck(txt)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         t  NA  48               =  1.82                   <
##   Reported.P.Value   Computed                   Raw Error DecisionError OneTail
## 1             0.05 0.07499768 t(48) = 1.82, p < .05  TRUE          TRUE    TRUE
##   OneTailedInTxt APAfactor
## 1           TRUE         1
# actively look for instances of one-tailed tests
statcheck(txt, OneTailedTxt = TRUE)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         t  NA  48               =  1.82                   <
##   Reported.P.Value   Computed                   Raw Error DecisionError OneTail
## 1             0.05 0.07499768 t(48) = 1.82, p < .05 FALSE         FALSE    TRUE
##   OneTailedInTxt APAfactor
## 1           TRUE         1

Note that in the first, default scenario the result is treated both as an Error and a DecisionError. However, if OneTailedTxt is set to TRUE, statcheck recognizes this test a one-tailed test and doesn’t flag it as an inconsistency. Note, however, that the computed p-value in the output still corresponds to a two-tailed test.

3.5.8 AllPValues

By default, statcheck searches for fully reported NHST results (test statistic, degrees of freedom if necessary, and a p-value). However, if you set the argument AllPValues to TRUE, statcheck will only search for p-values:

txt <- "We found a significant difference between groups, F(3, 147) = 3.45, p = .02. Group 1 had a higher mean than group 2, t(48) = 1.56, p<.05, but did not differ from group 3, t(48) = .34, p = .74. We found no difference between men and women, t(148) = .73, p = .763."
statcheck(txt, AllPValues = TRUE)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
##   Source Statistic Reported.Comparison Reported.P.Value      Raw
## 1      1         p                   =            0.020  p = .02
## 2      1         p                   <            0.050    p<.05
## 3      1         p                   =            0.740  p = .74
## 4      1         p                   =            0.763 p = .763

This option merely allows you to extract the p-values, it doesn’t extract enough information to recalculate anything.

3.6 Plot statcheck results

To get a rough overview of the inconsistencies in a paper, you can use statcheck’s plot function:

txt <- "We found a significant difference between groups, F(3, 147) = 3.45, p = .02. Group 1 had a higher mean than group 2, t(48) = 1.56, p<.05, but did not differ from group 3, t(48) = .34, p = .74. We found no difference between men and women, t(148) = .73, p = .763."
stat <- statcheck(txt)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
stat
##   Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1      1         F   3 147               =  3.45                   =
## 2      1         t  NA  48               =  1.56                   <
## 3      1         t  NA  48               =  0.34                   =
## 4      1         t  NA 148               =  0.73                   =
##   Reported.P.Value  Computed                       Raw Error DecisionError
## 1            0.020 0.0182658 F(3, 147) = 3.45, p = .02 FALSE         FALSE
## 2            0.050 0.1253296       t(48) = 1.56, p<.05  TRUE          TRUE
## 3            0.740 0.7353399      t(48) = .34, p = .74 FALSE         FALSE
## 4            0.763 0.4665441    t(148) = .73, p = .763  TRUE         FALSE
##   OneTail OneTailedInTxt APAfactor
## 1   FALSE          FALSE         1
## 2   FALSE          FALSE         1
## 3   FALSE          FALSE         1
## 4   FALSE          FALSE         1
plot(stat)

The plot shows the reported p-values (x-axis) against the computed p-values (y-axis). Exactly reported p-values (e.g., p = .74) should lie on the diagonal. The dotted lines indicate the region of significance (\(\alpha\) = .05 by default). The colors of the dots indicate whether this particular p-value was an inconsistency or even a decision error.

By default, the plot is created with ggplot2 (Wickham, 2009) in APA style. Many thanks to John Sakaluk for writing the APA style plot function. It is also possible to plot the statcheck results without using ggplot2:

plot(stat, APA = FALSE)

When APA=false, it is possible to modify the plot’s graphical parameters:

plot(stat, APA = FALSE, cex = 2, axes = FALSE)

The non-APA layout is also the layout that statcheck uses in the identify function that is explained in the next section.

3.7 Identify points in plot

If you plotted many statcheck results and there is one point that stands out in the graph, but it’s hard to find in the statcheck output, the identify function might be useful.

identify allows you to click on points in the plot to literally identify them in the data frame with statcheck results. Note: this function is only available if you run statcheck in RStudio.

For instance, if we go back to the example in the section above, we might be interested in the two p-values that were flagged as inconsistencies.

txt <- "We found a significant difference between groups, F(3, 147) = 3.45, p = .02. Group 1 had a higher mean than group 2, t(48) = 1.56, p<.05, but did not differ from group 3, t(48) = .34, p = .74. We found no difference between men and women, t(148) = .73, p = .763."
stat <- statcheck(txt)

identify(stat)

If you run this code, you will see the non-APA plot:

In the plot you can see two things: firstly, there is a cross next to one of the points. This is a cursor/locator with which you can click on the points in the plot that you are interested in. The second thing you see, is that in the top left corner of the plot it says “Locator active (Esc to finish)”. This means that when you’re finished clicking the points you’re interested in, press Esc to exit the plot. When you do, the points you clicked in the plot will be numbered and statcheck will give you the results of these points in a data frame:

As you can see in the data frame and the plot, the 2nd and 4th row from the output are selected with the identify function for further inspection.

3.8 Summarize statcheck results

The function summary uses the full statcheck data frame to calculate summarized results.

txt1 <- "We ran a manipulation check which showed that the stimulus achieved the desired effect, Z = 1.99, p < .05"
txt2 <- "We found a significant difference between groups, F(3, 147) = 3.45, p = .02. Group 1 had a higher mean than group 2, t(48) = 1.56, p<.05, but did not differ from group 3, t(48) = .34, p = .74. We found no difference between men and women, t(148) = .73, p = .763."

stat <- statcheck(c(txt1, txt2))
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
summary(stat)
##   Source pValues Errors DecisionErrors
## 1      1       1      0              0
## 2      2       4      2              1
## 3  Total       5      2              1

The output shows that there were two sources with statistics scanned. The summary indicates how many results were extracted per source (pValues), how many of those were an Error or DecisionError.

4 The statcheck Web App

A quick and easy way to scan an article with statcheck is by using the web app at http://statcheck.io.

4.1 Using the web app

To upload an article to scan, simply click on the “Browse.” button (see the screenshot below) and select an article in PDF, HTML, or DOCX format.

You have the option to count one-tailed tests as consistent if they’re explicitly identified as one-tailed tests. To do so, simply check the box “Try to identify and correct for one-tailed tests?” (see Figure 1). If you check this box, statcheck will consider results as correct if (1) the p-value would have been consistent if the test was one-tailed, and (2) if the words “one-tailed”, “one-sided”, or “directional” are somewhere in the full text.

When you selected an article, statcheck will automatically search it for APA reported NHST results. The program extracts the results and recalculates the p-values. The results will be displayed on the web page (see the screenshot below).

You can either inspect the results online, or download the results in a CSV file to your computer, by clicking the button “Download Results (csv)” (see the above screenshot and the screenshot below). The CSV output is equivalent to the output of the R package and is more extensive than the output that is directly printed to the screen.

Note that once all files have been analyzed, the source files are deleted. Outside of simple server and activity logs, no record of results is maintained.

4.1.1 Interpreting the results returned by statcheck.io

The output of the web app is more concise than that of the R package, to facilitate quick detection of possible inconsistencies (see the screenshot below). Each row in the data frame represents one extracted NHST result.

The column “Source” indicates from which article the statistic on this particular line was extracted. Specifically, “Source” shows the filename of the uploaded article (in this case, “Paper1”).

The next column, “Statistical Reference”, shows the full NHST result as it was found in the full text.

The colum “Computed p Value” shows the p-value that statcheck calculated based on the reported test statistic and degrees of freedom. By default, statcheck assumes two-sided tests, but if you checked the box “try to identify and correct for one-tailed tests?”, it will try to do so (equivalent to the argument in the R function OneTailedTxt = TRUE; see Section 3.5.7).

The final column, “Consistency”, shows the final results of the consistency check. This variable can take on three values: Consistent, Inconsistency, Decision Inconsistency. If a result is Consistent, the reported p-value is consistent with the reported test statistic and degrees of freedom, and that it matches the computed p-value. Conversely, if a result is an Inconsistency, the reported p-value is not consistent with the reported test statistic and degrees of freedom, and that it does not match the computed p-value. This is equivalent to an Error in the R package. Finally, if a result is a Decision Inconsistency, not only does the reported p-value not match the computed p-value, the “statistical decision” changes; either the reported p-value is statistically significant (p <= .05), and the computed p-value is not (p > .05), or vice versa. This is equivalent to a Decision Error in the R package. Note that by default, statcheck assumes that \(\alpha\) = .05, and that p = .05 is interpreted as statistically significant (see Nuijten et al., 2016).

In this example, the first result in the screenshot above shows that the result is Consistent, which means that the reported p-value is consistent with the reported test statistic and degrees of freedom.

The second row in the screenshot above illustrates an Inconsistency: the reported p-value is p = .14, whereas the computed p-value based on the reported test statistic and degrees of freedom is p = .12.

The third and last row in the screenshot illustrates a Decision Inconsistency. In this case the reported p-value is p < .001, which is clearly significant at \(\alpha\) = .05. However, the computed p-value is p = .31, which is clearly not significant.

Finally, it is possible that statcheck will not detect any APA reported NHST results at all. In that case, you will get the following message.

4.1.2 Sorting and selecting results in the output of statcheck.io

One of the nice features of the statcheck web app, is that you can easily sort and/or select results in the output. This is especially useful when you check documents with a large number of NHST results in them. For instance, if I run my own dissertation through statcheck, statcheck finds 72 results.

To quickly check which of my results are inconsistent, I can do several things. First, I can use the “Search” box to search for all results with an “inconsistency”. This selects both inconsistencies and decision inconsistencies (see screenshot below). Note that I could search for any text string. This enables me for instance to select only one type of test (search for “t(”) or p-values around .05 (search for “.05”), etc.

When I do this, statcheck finds 15 cases that are not consistent (see screenshot below). These are still so many results, that they do not fit on one page, but are divided over two.

If I want all my results on 1 page, I can change how many entries to display (see below).

Finally, I can sort the data frame by row by clicking on the column headers. In this case, it would be insightful to see which of the results are Inconsistencies and which are Decision Inconsistencies, so I can sort the rows by the column “Consistency” (see below).

5 Common Errors and Problems

There are several errors that can arise when installing or running statcheck. I’m currently logging the errors I encounter with their solutions. This list is far from complete, so if you have an error you can’t solve, or come across something that you think could be a bug, please contact me at .

5.1 Reasons why statcheck does not find results

There are several reasons why statcheck sometimes does not pick up result. Some of the most common reasons are listed below.

  • Result is not reported exactly according to APA style, e.g.:
    • degrees of freedom in subscript
    • square brackets instead of parentheses
    • semi-colons instead of commas
    • effect size in between test statistic and p-value
    • statistics reported in tables
  • Tests other than t, F, chi2, r, or Z (or in version 1.3.0 on GitHub and http://statcheck.io: Q-tests)
  • Typesetting issues in PDF documents. In some journals, mathematical symbols such as “=” are replaced by an image of this symbol, which can’t be converted to plain text. To quickly check if a reported result has issues like this, copy-paste it to a text editor: if all symbols show up normally, there are no (obvious) typesetting issues.

5.2 Common sources of inconsistencies

The list below gives some of the most common sources of inconsistencies.

  • Typos (e.g., p = .25 instead of p = .52)
  • Wrong rounding
  • When reporting a correlation, reporting the sample size instead of the degrees of freedom
  • Reporting p < .042 when in fact p = .042
  • Copy-paste errors: copying a previously reported result as a “template”, but forgetting to change one or more of the numbers
  • Using a one-tailed test without explicitly mentioning this in the text

5.3 Rcpp error

While installing statcheck, you can come across the error that the package Rcpp is not available:

> devtools::install_github("MicheleNuijten/statcheck")
Downloading GitHub repo MicheleNuijten/statcheck@master
from URL https://api.github.com/repos/MicheleNuijten/statcheck/zipball/master
Installing statcheck
Installing 1 package: Rcpp
trying URL 'http://cran.r-project.org/bin/windows/contrib/3.2/Rcpp_0.12.6.zip'
Content type 'application/zip' length 3221864 bytes (3.1 MB)
downloaded 3.1 MB

package 'Rcpp' successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package 'Rcpp'

The downloaded binary packages are in
    C:\Users\Mnuijten\AppData\Local\Temp\RtmpgfVUqR\downloaded_packages
"D:/R-3.2.3/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD  \
  INSTALL  \
  "C:/Users/Mnuijten/AppData/Local/Temp/RtmpgfVUqR/devtools1ce82b75e49/MicheleNuijten-statcheck-d20d222"  \
  --library="D:/R-3.2.3/library" --install-tests 

* installing *source* package 'statcheck' ...
** R
** byte-compile and prepare package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : 
  there is no package called 'Rcpp'
ERROR: lazy loading failed for package 'statcheck'
* removing 'D:/R-3.2.3/library/statcheck'
* restoring previous 'D:/R-3.2.3/library/statcheck'
Error: Command failed (1)

Here the solution is quite straightforward. Just re-install the package Rcpp manually:

install.packages("Rcpp")

5.4 Xcode license on Mac

When installing statcheck on Mac, you need to install Xcode. If your license is expired, you can get the following Warning:

> devtools::install_github("MicheleNuijten/statcheck")
Downloading github repo MicheleNuijten/statcheck@master
Installing statcheck
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL  \
  '/private/var/folders/bh/r4jqk1zj4kzfyd8g9x7bdysc0000gn/T/RtmpW6A5Dc/devtools416756816ec1/MicheleNuijten-statcheck-d20d222'  \
  --library='/Library/Frameworks/R.framework/Versions/3.1/Resources/library'  \
  --install-tests 

* installing *source* package ‘statcheck’ ...
** R
** byte-compile and prepare package for lazy loading


Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo.


Warning: running command ''otool' -L '/Library/Frameworks/R.framework/Resources/library/tcltk/libs//tcltk.so'' had status 69

The solution can be found here: http://stackoverflow.com/questions/26197347/agreeing-to-the-xcode-ios-license-requires-admin-privileges-please-re-run-as-r and entails simply entering the following code on the command line:

sudo xcodebuild -license

6 References

Epskamp, S. & Nuijten, M. B. (2016). statcheck: Extract statistics from articles and recompute p values. Retrieved from http://CRAN.R-project.org/package=statcheck. (R package version 1.2.2)

Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods, 48 (4), 1205-1226. DOI: 10.3758/s13428-015-0664-2 PDF

Nuijten, M. B., Van Assen, M. A. L. M., Hartgerink, C. H. J., Epskamp, S., & Wicherts, J. M. (2017). The validity of the tool “statcheck” in discovering statistical reporting inconsistencies. Preprint retrieved from https://psyarxiv.com/tcxaj/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.