This manual provides detailed instructions for the installation and
use of the free R package “statcheck: Extract statistics and recompute
p-values” (Epskamp & Nuijten, 2018). For a more concise manual check
out the R Help files of statcheck (?statcheck).
Before I get to the technical parts of installing and using statcheck, please consider for a moment what statcheck can and cannot do. The package statcheck is a program that automatically extracts statistics from articles and recomputes their p-values. It works as follows:
1. Convert PDF and HTML files to plain text.
PDF files are converted using the program Xpdf, HTML files are read in by R and automatically converted to raw text files.
2. Scan text for statistical results
statcheck searches for specific patterns and recognizes statistical results from correlations and t, F, \(\chi^2\), Z tests and Q tests. statcheck can only read these results if the results are reported exactly according to the APA guidelines:
All regular expressions take into account that test statistics and p values may be exactly (=) or inexactly (< or >) reported. Different spacing has also been taken into account, and case is ignored.
3. Use test statistics and degrees of freedom to recompute p value
By default the recomputed p value is two sided.
4. Compare reported and recomputed p value
This comparison takes into account how the results were reported, e.g. p < .05 is treated differently than p = .05. Incongruent p values are marked as an “Error”. If the reported result is significant and the recomputed result is not, or vice versa, the result is marked as a “Decision Error”.
Correct rounding is taken into account. For instance, a reported t value of 2.35 could correspond to an actual value of 2.345 to 2.354 with a range of p values that can slightly deviate from the recomputed p value. statcheck will not count cases like this as errors.
Furthermore, one-sided testing is taken into account: if somewhere in the article the words “one-tailed”, “one-sided”, or “directional” are mentioned, and the result would have been correct if it was one-sided, it is counted as a correctly reported one-sided test. However, as Felix Thoemmes pointed out to me, this only works if the one-tailed result was in line with the one-tailed hypothesis. If you expected a negative effect, but found a positive one, your p-value would be 1-p/2. Unfortunately statcheck can’t recognize this and will wrongly flag this as an inconsistency.
For a detailed validity study of statcheck, see Nuijten, Hartgerink, Van Assen, Epskamp, & Wicherts, 2016 and Nuijten, Hartgerink, Van Assen, Epskamp, & Wicherts, 2017.
Note that statcheck assumes that the p-value is the inconsistent value, but it could just as well be the case that the test statistic or degrees of freedom contain a reporting error. statcheck merely detects wether a set of numbers is consistent with each other.
Also note that corrected statistical results can also cause statcheck to flag them as inconsistent. For instance, if you want to perform a Bonferroni correction for multiple testing and multiply your p-value instead of dividing your \(\alpha\), statcheck will flag your result as inconsistent. This also holds for other corrections in which the test statistic, degrees of freedom, or p-value are adjusted (such as e.g., Greenhouse-Geisser). Note that when using these corrections it is advised to still report the result in a consistent manner (so don’t report the uncorrected degrees of freedom with the corrected test statistic and p-value). See Nuijten, et al., 2017 for details on this issue.
Finally, as Nick Brown pointed out to me, note that some cases are not flagged as an inconsistency, whereas they might look suspicious to a human reader. For instance, take the following case:
We found a significant difference, t(99) = 1.95, p = .05.
Here, the recalculated p-value is actually .054, which contradicts the claim that this finding is significant. However, statcheck will not flag this result as an inconsistency, since, well, it’s not. If you round “.054” to two decimals, it actually is .05. This is an example of a case in which a human might raise his/her eyebrows, but statcheck strictly adheres to the math: .054 is correctly rounded to .05, and hence not inconsistent.
There are several programs you need to install before you can start using statcheck. First you need to install R, and preferably the R environment RStudio. Furthermore, you will need the program Xpdf to enable statcheck to convert PDF articles to plain text files. In the last step you can load the package statcheck in R. The next sections explain where and how to download and install these programs.
To use statcheck you first need to install R. R is a free programming language and environment for statistical computing and graphics. You can download it from https://cran.r-project.org/. If you want you can now run R via Rstudio. Rstudio is an interface for R and it has several nice tricks that can make programming in R easier (syntax highlighting, more point-and-click options, etc.). You can obtain the latest version of Rstudio from http://rstudio.com.
statcheck relies on the program Xpdf that converts PDF files to plain text files. To download and install Xpdf, follow the steps for Windows or Mac below.
Many thanks to Daniel Lakens who created the step by step installation manual for Xpdf.
1. Download Xpdf and unzip the contents (especially bin32 and bin 64) of the Xpdf to a folder: http://www.foolabs.com/xpdf/download.html.
2. Close R and RStudio
3. Right click the ‘This PC’ icon in windows explorer:
4. Click ‘properties’
5. Click ‘Advanced system settings’
6. Click ‘Environment Variables’
7. Select ‘Path’ and click ‘Edit’
8. Add the path to the folder containing the Xpdf folder, and specifically the folder bin32 (for a x86 system) or bin64 (for a 64 bit system). Add the path after a ; seperator.
If you want to run statcheck on a Mac, follow these steps:
Step 1. Download XQuartz. XQuartz is an open source version of the X.Org X Window System that runs on OS X. You can download it at http://xquartz.macosforge.org/landing/.
Step 2. Download the binaries for Xpdf and unzip them (see step 1 in the instructions above).
Step 3. Add the location of the Xpdf binaries to your path. For a step-by-step instruction how to add something to the path on Mac, see this website.
If these three steps don’t work (they seem to create some problems on MacOS 10.11.5), you can try the following other steps:
# Install homebrew: open terminal and paste
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
# Install xquartz: open terminal and paste
brew install xquartz --cask
# Install xpdf: open terminal and paste
brew install xpdf
Thanks go to Eiko Fried for figuring this out for me, and Lukas Wallrich for pointing out an update to the syntax.
After you installed Xpdf, you can load the R package statcheck with the following lines of code:
install.packages("statcheck")
library("statcheck")
statcheck is now ready to use!
statcheck is constantly being updated, which means that the version on CRAN is not always the latest version. Instead of downloading statcheck from CRAN, you can also download the latest version of statcheck directly from GitHub with the following lines of R code:
devtools::install_github("MicheleNuijten/statcheck")
library("statcheck")
Note that you still need to have Xpdf installed before you can download and run statcheck.
The most basic function of statcheck is to extract statistics from a raw string of text, recalculate the p-value, and report back the findings. In the example below I simply feed statcheck a string of text with some statistics in it, and it gives me back a table with the results. I’ll go over the results below.
txt <- "blablabla the effect was very significant (t(48) = 1.02, p < .05)"
statcheck(txt)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 t NA 48 = 1.02 <
## Reported.P.Value Computed Raw Error DecisionError OneTail
## 1 0.05 0.3128421 t(48) = 1.02, p < .05 TRUE TRUE FALSE
## OneTailedInTxt APAfactor
## 1 FALSE 1
Note that when you want to scan a Z test, you need to place a space in front of Z. This is because of an ugly hack I had to build in: sometimes Z tests were read as \(\chi^2\) and this was the easiest way to avoid that:
statcheck("z = 1.95, p = .05")
# instead, type
statcheck(" z = 1.95, p = .05")
The output table is quite large, and because of the screen size it often gets divided over several lines (as is the case now). This can make the output a bit hard to read. I pasted an unfolded table below to give you an idea what it looks like.
I’ll go over the different parts of the output below.
The first part of the output (as shown below) shows which statistics are extracted from the input.
txt <- "blablabla the effect was very significant (F(2, 65) = 3.02, p < .05)"
stat <- statcheck(txt)
##
|
| | 0%
|
|======================================================================| 100%
stat[ ,1:8]
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 F 2 65 = 3.02 <
## Reported.P.Value
## 1 0.05
Source indicates from which “Source” the statistic on
this particular line was extracted. When you feed statcheck raw strings
of text (as I did here), it numbers the objects in the input vector.
Here I only fed statcheck one string, so Source is 1. If
you feed statcheck an article, Source shows the file name
of the article.
The next columns show the full result that was extracted, split up in the different elements. Splitting up the different elements that were extracted can be useful if you for instance want to investigate how many F-values an article or set of articles contains, or if you want to investigate the distribution of extracted p-values.
The second part of the output, as shown below, shows the result of the consistency check.
stat[ ,9:12]
## Computed Raw Error DecisionError
## 1 0.05569781 F(2, 65) = 3.02, p < .05 TRUE TRUE
The column Computed shows the computed p-value that
statcheck calculated based on the reported test statistic and degrees of
freedom. It depends on the argument OneTailedTests whether
this is a two-sided (default; OneTailedTests=FALSE) or
one-sided p-value (OneTailedTests=FALSE).
In the column Raw you find the complete result as
extracted by statcheck. This is the concatenated version of the
information in the previous columns.
The columns Error and DecisionError show
the final results of the consistency check. In this example,
Error is TRUE, which means that the reported
p-value is inconsistent with the reported test statistic and degrees of
freedom. You can also see this in the output, where the reported p-value
is “< .05”, whereas the computed p-value is .0557. This also means
that DecisionError is also TRUE, which
indicates not only that the reported p-value is inconsistent, but it
also changes the statistical conclusion: the reported p-value is
significant, whereas the recomputed one is not.Note that a result can
only be a DecisionError when it is also an
Error.
The last part of the output shows some extra information about the extracted result and the article it was extracted from.
## OneTail OneTailedInTxt APAfactor
## 1 TRUE FALSE 1
The column OneTail indicates whether a result could
have been a one-tailed test. This means that the result would have
been correct if it is a one-tailed test.
The column OneTailedInTxt indicates whether the string
“one-tailed”, “one-sided”, or “directional” was found in the full text
that statcheck scanned. If OneTailedInTxt==TRUE, it could
be a good indication that some or all of the tests are one-tailed, which
means that some of the flagged inconsistencies might not be inconsistent
at all.
Finally, the column APAfactor indicates the proportion
of all detected p-values that were part of a fully APA reported NHST
result. It gives a rought indication of how many statistical tests
statcheck may have missed because of reporting issues.
In some cases, statcheck will evaluate a string of text and on top of the output it will also give back one or two messages:
Extracting statistics...
|==================================================================| 100%
Check the significance level.
Some of the p value incongruencies are decision errors if the significance level is .1 or .01 instead of the conventional .05. It is recommended to check the actual significance level in the paper or text. Check if the reported p values are a decision error at a different significance level by running statcheck again with 'alpha' set to .1 and/or .01.
Check for one tailed tests.
Some of the p value incongruencies might in fact be one tailed tests. It is recommended to check this in the actual paper or text. Check if the p values would also be incongruent if the test is indeed one sided by running statcheck again with 'OneTailedTests' set to TRUE. To see which Sources probably contain a one tailed test, try unique(x$Source[x$OneTail]) (where x is the statcheck output).
In the first message, “Check the significance level”, statcheck warns
you that even though it assumed an \(\alpha\) of .05, there might be
DecisionErrors in the text you scanned if \(\alpha = .1\) or \(\alpha = .01\). It is advised to check the
full text of the article to look up the significance level.
In the second message, “Check for one tailed tests”, statcheck warns you that some of the flagged inconsistencies could be caused by the use of one-tailed tests. It is advised to check the full text of the article to see if a result is actually consistent because a one tailed test was used.
It is of course possible that the text you want to check doesn’t contain any (correctly reported) APA results. In that case, statcheck will give the following notification:
txt <- "All the results were significant, all ps < .05"
statcheck(txt)
## Extracting statistics...
##
|
| | 0%
|
|======================================================================| 100%
## statcheck did not find any results
One of the main advantages of statcheck is that it can automatically
extract statistical results from articles, so that you don’t have to
enter them manually. Instead of the statcheck function, you
use the checkPDF function, which automatically calls
statcheck.
There are three ways to select an article using
checkPDF.
The first way is to specify the entire path of the article you want to scan:
checkPDF("C:/Dropbox/Science/Papers/my_article.pdf")
The second way is to specify the location of the article in your
working directory and then run checkPDF:
setwd("C:/Dropbox/Science/Papers/")
checkPDF("my_article.pdf")
The third and last way is not specifying anything:
checkPDF()
If you don’t specify a file, you will get a point-and-click pop-up in which you can select the article you want to scan.
Note that there can be type setting issues in PDF articles. To be able to read PDF, statcheck uses the program Xpdf to convert a text in PDF into plain text. However, sometimes mathematical symbols such as “<” or “=” are switched by images in the text, which Xpdf cannot translate into plain text. If that is the case, statcheck will miss these results. I would therefore advise to use HTML articles when possible.
You can use statcheck on an HTML article by using the function
checkHTML. This function works in the same way as
checkPDF (see the section above).
If you have access to both the PDF and the HTML version of an article, I would advise to check the HTML article to avoid typesetting issues.
It is also possible to automatically let statcheck check an entire
folder of articles with the functions checkPDFdir (scans
all the PDF articles in the specified folder), checkHTMLdir
(scans all the HTML files in the specified folder), or simply
checkdir (scans both PDF and HTML files in a folder).
Specifying the folder/directory works similarly as specifying a file
in the checkPDF or checkHTML function (see
sections above):
checkdir("C:/Dropbox/Science/Papers/")
# or
checkdir()
All three functions to check an entire directory have an argument
with which you can specify if you also want to check all subdirectories
within the directory you specified: subdir=TRUE. By
default, these functions will check everything in a directory, also the
subdirectories. If you don’t want to check subdirectories, you can set
this argument to false.
# checks subdirectories as well
checkdir("C:/Dropbox/Science/Papers/")
# does not check the subdirectories
checkdir("C:/Dropbox/Science/Papers/", subdir = FALSE)
Lastly, there is an extra argument in checkHTMLdir,
namely extension = TRUE. This argument controls whether
statcheck will search the specified directories for files with the
extension “.htm” or “.html”, or if it will just check everything. This
argument is mainly useful when you hide file extensions in your Explorer
or Finder.
The statcheck function requires several arguments, most of which have a default value. The arguments control aspects such as: the assumed alpha level, how one-tailed tests are treated, and which types of statistics should be checked. These are all statcheck’s arguments with their default values:
statcheck(x, stat = c("t", "F", "cor", "chisq", "Z"),
OneTailedTests = FALSE, alpha = 0.05, pEqualAlphaSig = TRUE,
OneTailedTxt = FALSE, AllPValues = FALSE)
All these arguments can also be specified in the functions to check
entire articles or folders of articles. E.g., when using
checkPDF, you can also specify specific arguments as
follows:
checkPDF("my_article.pdf", OneTailedTxt = TRUE, alpha = .01)
I will go over all arguments below.
The argument x defines the main input variable: a string
of text from which you’d like to extract and check the statistics. This
is the only argument without a default value.
txt <- "blablabla the effect was very significant (t(100)=1, p < 0.001)"
statcheck(txt)
Or directly:
statcheck("blablabla the effect was very significant (t(100)=1, p < 0.001)")
It is also possible to evaluate a vector of strings:
txt1 <- "t(100) = 1, p < 0.001"
txt2 <- "F(2,45) = 2.81, p = .45"
statcheck(c(txt1, txt2))
##
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 t NA 100 = 1.00 <
## 2 2 F 2 45 = 2.81 =
## Reported.P.Value Computed Raw Error DecisionError
## 1 0.001 0.31972416 t(100) = 1, p < 0.001 TRUE TRUE
## 2 0.450 0.07080002 F(2,45) = 2.81, p = .45 TRUE FALSE
## OneTail OneTailedInTxt APAfactor
## 1 FALSE FALSE 1
## 2 FALSE FALSE 1
The argument stat defines which type of statistics will
be extracted. By default statcheck extracts all the statistics it can
read: t, F, correlations, \(\chi^2\),
and Z. With stat you can tell statcheck to only extract a
selection of those statistics or even just one type, e.g.:
txt <- "blablabla result 1 is t(100) = 1, p < 0.001, and result 2 is F(2,45) = 2.81, p = .45"
statcheck(txt, stat = "t")
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 t NA 100 = 1 <
## Reported.P.Value Computed Raw Error DecisionError OneTail
## 1 0.001 0.3197242 t(100) = 1, p < 0.001 TRUE TRUE FALSE
## OneTailedInTxt APAfactor
## 1 FALSE 0.5
Or when you’re specifically looking for only t statistics and correlations:
txt <- "blablabla result 1 is t(100) = 1, p < 0.001, and result 2 is F(2,45) = 2.81, p = .45"
statcheck(txt, stat = c("t", "cor"))
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 t NA 100 = 1 <
## Reported.P.Value Computed Raw Error DecisionError OneTail
## 1 0.001 0.3197242 t(100) = 1, p < 0.001 TRUE TRUE FALSE
## OneTailedInTxt APAfactor
## 1 FALSE 0.5
By default, statcheck extracts all statistics it can read.
The argument OneTailedTests defines whether we treat all
extracted tests as two-tailed (OneTailedTests=FALSE;
default) or as one-tailed (OneTailedTests=TRUE):
txt <- "F(2,45) = 2.81, p = .45"
statcheck(txt, OneTailedTests = TRUE)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 F 2 45 = 2.81 =
## Reported.P.Value Computed Raw Error DecisionError
## 1 0.45 0.03540001 F(2,45) = 2.81, p = .45 TRUE TRUE
## OneTail OneTailedInTxt APAfactor
## 1 FALSE FALSE 1
In the chunk above, the extracted test will be treated as a one-tailed test. Note that the computed p-value in the output now is divided by two.
This argument is not very “smart”. It will consider ALL extracted tests as one-tailed, even when that is impossible (e.g., when the df1 of an F test is > 2).
The argument alpha defines which level of significance
statcheck will assume. This argument defaults to \(\alpha\) = .05, the widely used criterium
in psychology.
If a result isn’t a DecisionError with \(\alpha\) = .05, but it would be when \(\alpha\) = .01 or \(\alpha\) = .10, statcheck will print a message advising you to look up the actual level of significance that is used in the article. If it turns out that an \(\alpha\) of .01 is used, you can adapt that in statcheck:
txt <- "F(2,45) = 5.81, p = .03"
# default level of significance = .05
statcheck(txt)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 F 2 45 = 5.81 =
## Reported.P.Value Computed Raw Error DecisionError
## 1 0.03 0.005694552 F(2,45) = 5.81, p = .03 TRUE FALSE
## OneTail OneTailedInTxt APAfactor
## 1 FALSE FALSE 1
# change level of significance
statcheck(txt, alpha = .01)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 F 2 45 = 5.81 =
## Reported.P.Value Computed Raw Error DecisionError
## 1 0.03 0.005694552 F(2,45) = 5.81, p = .03 TRUE TRUE
## OneTail OneTailedInTxt APAfactor
## 1 FALSE FALSE 1
Note that when \(\alpha\)=.01 this
result is counted as a DecisionError.
The argument pEqualAlphaSig defines whether a p-value
equal to \(\alpha\) is treated as
significant (pEqualAlphaSig=TRUE; default) or not
(pEqualAlphaSig=FALSE).
By convention, a result is deemed significant when p < \(\alpha\), which would correspond with the
setting pEqualAlphaSig=FALSE. However, when we inspected
all instances of published papers in which statcheck found “p = .05”, we
found that in almost 95% of the cases the authors had interpreted the
result as significant (Nuijten et al., in press). Because of this
finding, we decided to count p =< \(\alpha\) as significant by default.
This example illustrates the difference between the two options:
txt <- "F(2,45) = 4.10, p = .05"
# default pEqualAlphaSig=TRUE
statcheck(txt)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 F 2 45 = 4.1 =
## Reported.P.Value Computed Raw Error DecisionError
## 1 0.05 0.02313502 F(2,45) = 4.10, p = .05 TRUE FALSE
## OneTail OneTailedInTxt APAfactor
## 1 FALSE FALSE 1
# change level of significance
statcheck(txt, pEqualAlphaSig = FALSE)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 F 2 45 = 4.1 =
## Reported.P.Value Computed Raw Error DecisionError
## 1 0.05 0.02313502 F(2,45) = 4.10, p = .05 TRUE TRUE
## OneTail OneTailedInTxt APAfactor
## 1 FALSE FALSE 1
Note that in the default setting this result was not counted as a
DecisionError. After all, the reported “p = .05” is counted
as significant, and even though the computed p-value (p = .02) is not
consistent with p = .05, it is still significant.
In the version where pEqualAlphaSig=FALSE, this result
is counted as a DecisionError, because here the reported
result is not significant (only p < .05 is counted as significant,
not p = .05), whereas the computed result is.
The argument pZeroError controls whether statcheck will
count a p-value reported as “p=.000” as an error (default,
pZeroError=TRUE) or not
(pZeroError=FALSE).
We decided to count these cases as an error, because according to the APA reporting guidelines, a p-value smaller than .001 should be reported as “p<.001”, and never as “p=.000”. Furthermore, a p-value can never be exactly zero, so it makes no sense to report it as such.
If you find these criteria too stringent, you can set
pZeroError to FALSE.
The argument OneTailedTxt controls whether statcheck
will try to identify and correct for one-tailed tests. This is one of
the most important arguments in statcheck. Our validity study showed
that statcheck’s accuracy improved when OneTailedTxt = TRUE
(Nuijten et al., in press).
By default, OneTailedTxt = FALSE and statcheck will
treat all tests it finds the same, depending on your settings for
OneTailedTests. However, you can tell statcheck to actively
look for tests that might be one-tailed by setting OneTailedTxt =
TRUE.
When OneTailedTxt = TRUE, statcheck will search the
entire text for the keywords “one-tailed”, “one-sided”, and
“directional” (taking spacing issues etc. into account). When statcheck
finds at least one of those keywords AND an initially inconsistent
result would be consistent if it was a one-tailed test, then statcheck
treats this case as a one-tailed test and counts it as consistent.
txt <- "... t(48) = 1.82, p < .05, all tests were one-tailed."
# default OneTailedTxt=FALSE
statcheck(txt)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 t NA 48 = 1.82 <
## Reported.P.Value Computed Raw Error DecisionError OneTail
## 1 0.05 0.07499768 t(48) = 1.82, p < .05 TRUE TRUE TRUE
## OneTailedInTxt APAfactor
## 1 TRUE 1
# actively look for instances of one-tailed tests
statcheck(txt, OneTailedTxt = TRUE)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 t NA 48 = 1.82 <
## Reported.P.Value Computed Raw Error DecisionError OneTail
## 1 0.05 0.07499768 t(48) = 1.82, p < .05 FALSE FALSE TRUE
## OneTailedInTxt APAfactor
## 1 TRUE 1
Note that in the first, default scenario the result is treated both
as an Error and a DecisionError. However, if
OneTailedTxt is set to TRUE, statcheck
recognizes this test a one-tailed test and doesn’t flag it as an
inconsistency. Note, however, that the computed p-value in the output
still corresponds to a two-tailed test.
By default, statcheck searches for fully reported NHST results (test
statistic, degrees of freedom if necessary, and a p-value). However, if
you set the argument AllPValues to TRUE,
statcheck will only search for p-values:
txt <- "We found a significant difference between groups, F(3, 147) = 3.45, p = .02. Group 1 had a higher mean than group 2, t(48) = 1.56, p<.05, but did not differ from group 3, t(48) = .34, p = .74. We found no difference between men and women, t(148) = .73, p = .763."
statcheck(txt, AllPValues = TRUE)
##
|
| | 0%
|
|======================================================================| 100%
## Source Statistic Reported.Comparison Reported.P.Value Raw
## 1 1 p = 0.020 p = .02
## 2 1 p < 0.050 p<.05
## 3 1 p = 0.740 p = .74
## 4 1 p = 0.763 p = .763
This option merely allows you to extract the p-values, it doesn’t extract enough information to recalculate anything.
To get a rough overview of the inconsistencies in a paper, you can use statcheck’s plot function:
txt <- "We found a significant difference between groups, F(3, 147) = 3.45, p = .02. Group 1 had a higher mean than group 2, t(48) = 1.56, p<.05, but did not differ from group 3, t(48) = .34, p = .74. We found no difference between men and women, t(148) = .73, p = .763."
stat <- statcheck(txt)
##
|
| | 0%
|
|======================================================================| 100%
stat
## Source Statistic df1 df2 Test.Comparison Value Reported.Comparison
## 1 1 F 3 147 = 3.45 =
## 2 1 t NA 48 = 1.56 <
## 3 1 t NA 48 = 0.34 =
## 4 1 t NA 148 = 0.73 =
## Reported.P.Value Computed Raw Error DecisionError
## 1 0.020 0.0182658 F(3, 147) = 3.45, p = .02 FALSE FALSE
## 2 0.050 0.1253296 t(48) = 1.56, p<.05 TRUE TRUE
## 3 0.740 0.7353399 t(48) = .34, p = .74 FALSE FALSE
## 4 0.763 0.4665441 t(148) = .73, p = .763 TRUE FALSE
## OneTail OneTailedInTxt APAfactor
## 1 FALSE FALSE 1
## 2 FALSE FALSE 1
## 3 FALSE FALSE 1
## 4 FALSE FALSE 1
plot(stat)
The plot shows the reported p-values (x-axis) against the computed p-values (y-axis). Exactly reported p-values (e.g., p = .74) should lie on the diagonal. The dotted lines indicate the region of significance (\(\alpha\) = .05 by default). The colors of the dots indicate whether this particular p-value was an inconsistency or even a decision error.
By default, the plot is created with ggplot2 (Wickham, 2009) in APA style. Many thanks to John Sakaluk for writing the APA style plot function. It is also possible to plot the statcheck results without using ggplot2:
plot(stat, APA = FALSE)
When APA=false, it is possible to modify the plot’s
graphical parameters:
plot(stat, APA = FALSE, cex = 2, axes = FALSE)
The non-APA layout is also the layout that statcheck uses in the
identify function that is explained in the next
section.
If you plotted many statcheck results and there is one point that
stands out in the graph, but it’s hard to find in the statcheck output,
the identify function might be useful.
identify allows you to click on points in the plot to
literally identify them in the data frame with statcheck results. Note:
this function is only available if you run statcheck in RStudio.
For instance, if we go back to the example in the section above, we might be interested in the two p-values that were flagged as inconsistencies.
txt <- "We found a significant difference between groups, F(3, 147) = 3.45, p = .02. Group 1 had a higher mean than group 2, t(48) = 1.56, p<.05, but did not differ from group 3, t(48) = .34, p = .74. We found no difference between men and women, t(148) = .73, p = .763."
stat <- statcheck(txt)
identify(stat)
If you run this code, you will see the non-APA plot:
In the plot you can see two things: firstly, there is a cross next to one of the points. This is a cursor/locator with which you can click on the points in the plot that you are interested in. The second thing you see, is that in the top left corner of the plot it says “Locator active (Esc to finish)”. This means that when you’re finished clicking the points you’re interested in, press Esc to exit the plot. When you do, the points you clicked in the plot will be numbered and statcheck will give you the results of these points in a data frame:
As you can see in the data frame and the plot, the 2nd and 4th row from the output are selected with the identify function for further inspection.
The function summary uses the full statcheck data frame
to calculate summarized results.
txt1 <- "We ran a manipulation check which showed that the stimulus achieved the desired effect, Z = 1.99, p < .05"
txt2 <- "We found a significant difference between groups, F(3, 147) = 3.45, p = .02. Group 1 had a higher mean than group 2, t(48) = 1.56, p<.05, but did not differ from group 3, t(48) = .34, p = .74. We found no difference between men and women, t(148) = .73, p = .763."
stat <- statcheck(c(txt1, txt2))
##
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
summary(stat)
## Source pValues Errors DecisionErrors
## 1 1 1 0 0
## 2 2 4 2 1
## 3 Total 5 2 1
The output shows that there were two sources with statistics scanned.
The summary indicates how many results were extracted per source
(pValues), how many of those were an Error or
DecisionError.
A quick and easy way to scan an article with statcheck is by using the web app at http://statcheck.io.
To upload an article to scan, simply click on the “Browse.” button (see the screenshot below) and select an article in PDF, HTML, or DOCX format.
You have the option to count one-tailed tests as consistent if they’re explicitly identified as one-tailed tests. To do so, simply check the box “Try to identify and correct for one-tailed tests?” (see Figure 1). If you check this box, statcheck will consider results as correct if (1) the p-value would have been consistent if the test was one-tailed, and (2) if the words “one-tailed”, “one-sided”, or “directional” are somewhere in the full text.
When you selected an article, statcheck will automatically search it for APA reported NHST results. The program extracts the results and recalculates the p-values. The results will be displayed on the web page (see the screenshot below).
You can either inspect the results online, or download the results in a CSV file to your computer, by clicking the button “Download Results (csv)” (see the above screenshot and the screenshot below). The CSV output is equivalent to the output of the R package and is more extensive than the output that is directly printed to the screen.
Note that once all files have been analyzed, the source files are deleted. Outside of simple server and activity logs, no record of results is maintained.
The output of the web app is more concise than that of the R package, to facilitate quick detection of possible inconsistencies (see the screenshot below). Each row in the data frame represents one extracted NHST result.
The column “Source” indicates from which article the statistic on this particular line was extracted. Specifically, “Source” shows the filename of the uploaded article (in this case, “Paper1”).
The next column, “Statistical Reference”, shows the full NHST result as it was found in the full text.
The colum “Computed p Value” shows the p-value that statcheck
calculated based on the reported test statistic and degrees of freedom.
By default, statcheck assumes two-sided tests, but if you checked the
box “try to identify and correct for one-tailed tests?”, it will try to
do so (equivalent to the argument in the R function
OneTailedTxt = TRUE; see Section 3.5.7).
The final column, “Consistency”, shows the final results of the
consistency check. This variable can take on three values: Consistent,
Inconsistency, Decision Inconsistency. If a result is Consistent, the
reported p-value is consistent with the reported test statistic and
degrees of freedom, and that it matches the computed p-value.
Conversely, if a result is an Inconsistency, the reported p-value is not
consistent with the reported test statistic and degrees of freedom, and
that it does not match the computed p-value. This is equivalent to an
Error in the R package. Finally, if a result is a Decision
Inconsistency, not only does the reported p-value not match the computed
p-value, the “statistical decision” changes; either the reported p-value
is statistically significant (p <= .05), and the computed p-value is
not (p > .05), or vice versa. This is equivalent to a
Decision Error in the R package. Note that by default,
statcheck assumes that \(\alpha\) =
.05, and that p = .05 is interpreted as statistically significant (see
Nuijten
et al., 2016).
In this example, the first result in the screenshot above shows that the result is Consistent, which means that the reported p-value is consistent with the reported test statistic and degrees of freedom.
The second row in the screenshot above illustrates an Inconsistency: the reported p-value is p = .14, whereas the computed p-value based on the reported test statistic and degrees of freedom is p = .12.
The third and last row in the screenshot illustrates a Decision Inconsistency. In this case the reported p-value is p < .001, which is clearly significant at \(\alpha\) = .05. However, the computed p-value is p = .31, which is clearly not significant.
Finally, it is possible that statcheck will not detect any APA reported NHST results at all. In that case, you will get the following message.
One of the nice features of the statcheck web app, is that you can easily sort and/or select results in the output. This is especially useful when you check documents with a large number of NHST results in them. For instance, if I run my own dissertation through statcheck, statcheck finds 72 results.
To quickly check which of my results are inconsistent, I can do several things. First, I can use the “Search” box to search for all results with an “inconsistency”. This selects both inconsistencies and decision inconsistencies (see screenshot below). Note that I could search for any text string. This enables me for instance to select only one type of test (search for “t(”) or p-values around .05 (search for “.05”), etc.
When I do this, statcheck finds 15 cases that are not consistent (see screenshot below). These are still so many results, that they do not fit on one page, but are divided over two.
If I want all my results on 1 page, I can change how many entries to display (see below).
Finally, I can sort the data frame by row by clicking on the column headers. In this case, it would be insightful to see which of the results are Inconsistencies and which are Decision Inconsistencies, so I can sort the rows by the column “Consistency” (see below).
There are several errors that can arise when installing or running statcheck. I’m currently logging the errors I encounter with their solutions. This list is far from complete, so if you have an error you can’t solve, or come across something that you think could be a bug, please contact me at m.b.nuijten@uvt.nl.
There are several reasons why statcheck sometimes does not pick up result. Some of the most common reasons are listed below.
The list below gives some of the most common sources of inconsistencies.
While installing statcheck, you can come across the error that the package Rcpp is not available:
> devtools::install_github("MicheleNuijten/statcheck")
Downloading GitHub repo MicheleNuijten/statcheck@master
from URL https://api.github.com/repos/MicheleNuijten/statcheck/zipball/master
Installing statcheck
Installing 1 package: Rcpp
trying URL 'http://cran.r-project.org/bin/windows/contrib/3.2/Rcpp_0.12.6.zip'
Content type 'application/zip' length 3221864 bytes (3.1 MB)
downloaded 3.1 MB
package 'Rcpp' successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package 'Rcpp'
The downloaded binary packages are in
C:\Users\Mnuijten\AppData\Local\Temp\RtmpgfVUqR\downloaded_packages
"D:/R-3.2.3/bin/x64/R" --no-site-file --no-environ --no-save --no-restore --quiet CMD \
INSTALL \
"C:/Users/Mnuijten/AppData/Local/Temp/RtmpgfVUqR/devtools1ce82b75e49/MicheleNuijten-statcheck-d20d222" \
--library="D:/R-3.2.3/library" --install-tests
* installing *source* package 'statcheck' ...
** R
** byte-compile and prepare package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called 'Rcpp'
ERROR: lazy loading failed for package 'statcheck'
* removing 'D:/R-3.2.3/library/statcheck'
* restoring previous 'D:/R-3.2.3/library/statcheck'
Error: Command failed (1)
Here the solution is quite straightforward. Just re-install the package Rcpp manually:
install.packages("Rcpp")
When installing statcheck on Mac, you need to install Xcode. If your license is expired, you can get the following Warning:
> devtools::install_github("MicheleNuijten/statcheck")
Downloading github repo MicheleNuijten/statcheck@master
Installing statcheck
'/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL \
'/private/var/folders/bh/r4jqk1zj4kzfyd8g9x7bdysc0000gn/T/RtmpW6A5Dc/devtools416756816ec1/MicheleNuijten-statcheck-d20d222' \
--library='/Library/Frameworks/R.framework/Versions/3.1/Resources/library' \
--install-tests
* installing *source* package ‘statcheck’ ...
** R
** byte-compile and prepare package for lazy loading
Agreeing to the Xcode/iOS license requires admin privileges, please re-run as root via sudo.
Warning: running command ''otool' -L '/Library/Frameworks/R.framework/Resources/library/tcltk/libs//tcltk.so'' had status 69
The solution can be found here: http://stackoverflow.com/questions/26197347/agreeing-to-the-xcode-ios-license-requires-admin-privileges-please-re-run-as-r and entails simply entering the following code on the command line:
sudo xcodebuild -license
Epskamp, S. & Nuijten, M. B. (2016). statcheck: Extract statistics from articles and recompute p values. Retrieved from http://CRAN.R-project.org/package=statcheck. (R package version 1.2.2)
Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985-2013). Behavior Research Methods, 48 (4), 1205-1226. DOI: 10.3758/s13428-015-0664-2 PDF
Nuijten, M. B., Van Assen, M. A. L. M., Hartgerink, C. H. J., Epskamp, S., & Wicherts, J. M. (2017). The validity of the tool “statcheck” in discovering statistical reporting inconsistencies. Preprint retrieved from https://psyarxiv.com/tcxaj/.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.