In their highly-recognized paper (over 12k+ citations at the time this was written), “Power-Law Distributions in Empirical Data” – available here at arXiv, where it was first published, Clauset et al 2007) – the authors, “[P]resent a principled statistical framework for discerning and quantifying power-law behavior in empirical data.” An R package implementing this framework is available on CRAN – poweRlaw.
Here, Clauset’s methodology and the poweRlaw R package will be used to test the null hypothesis that there is no difference between a randomly parameterized (within limits) distribution and the Power Law distribution that best fits the empirical data. In other words, it is using the framework to create a binary classifier: is the data plausibly from a Power Law distribution or not?
The experimental test set is comprised of thirty continuous Power Law distributions (the positive cases) and thirty continuous Log-Normal distributions (the negative cases). Log-Normal distributions were chosen for the negative case as they can closely resemble Power Law distributions (as can Exponential and Poisson distributions, which are considered in more detail in the original Clauset paper).
Hypthesis testing is performed at the p \(\geq\) 0.05 level of significance, or 95% confidence.
Finally, the performance of the Clauset methodology as a binary classifier on this particular test set is presented, demonstrating its effectiveness.