2026-04-09

Experimental Power Law Identification

In their highly-recognized paper (over 12k+ citations at the time this was written), “Power-Law Distributions in Empirical Data” – available here at arXiv, where it was first published, Clauset et al 2007) – the authors, “[P]resent a principled statistical framework for discerning and quantifying power-law behavior in empirical data.” An R package implementing this framework is available on CRAN – poweRlaw.

Here, Clauset’s methodology and the poweRlaw R package will be used to test the null hypothesis that there is no difference between a randomly parameterized (within limits) distribution and the Power Law distribution that best fits the empirical data. In other words, it is using the framework to create a binary classifier: is the data plausibly from a Power Law distribution or not?

The experimental test set is comprised of thirty continuous Power Law distributions (the positive cases) and thirty continuous Log-Normal distributions (the negative cases). Log-Normal distributions were chosen for the negative case as they can closely resemble Power Law distributions (as can Exponential and Poisson distributions, which are considered in more detail in the original Clauset paper).

Hypthesis testing is performed at the p \(\geq\) 0.05 level of significance, or 95% confidence.

Finally, the performance of the Clauset methodology as a binary classifier on this particular test set is presented, demonstrating its effectiveness.

Caveats

The code for this experiment does include CRAN packages that may not be installed in your local environment and that are not a part of base R. No installation of these packages is performed automatically, they are loaded using the library(...) command only. All are called out at the very beginning of the R setup section, which includes a description of their purpose.

By default, a pre-calculated set of distributions is loaded from the local workspace file distribution_list.rds. If this file is not present, a new set of distributions will be created and saved. The provided file includes the p-values from null hypothesis testing. Be aware: even as written and including optimizations like parallel processing and careful balancing of the sample size versus bootstrap iterations, the Clauset framework and poweRlaw package can be very processor intensive and slow – even prohibitively so.

If you generate a new sample of distributions, changing the following can further degrade performance or even introduce instability as extreme values are produced by the Power Law distributions.

  • The number of data points n generated for each distribution
  • The sample space for \(x_{min}\) at distribution generation
  • The distribution parameter sample spaces for \(\alpha\), \(\mu\), and $$
  • The Kolmogorov-Smirnov statistic sample space for estimating \(x_{min}\)
  • The number of iterations performed in the bootstrap_p function call
  • The number of threads used in bootstrap_p
  • The use of discrete distributions (continuous ones used are faster)

Importance of Power Law Identification

Knowing when a data set plausibly represents a Power Law distribution is important for several reasons.

  • Power Laws are common in the natural and human-made worlds, as well as in natural language.
  • “Extreme” events are not only possible, they’re expected in Power Laws. This is very different than other distribution types – notably the Gaussian Normal distribution, where extreme values are so rare they are practically “impossible.”
  • While a sample from a Power Law distribution will always have a mean and standard deviation, the population may not. When rate parameter \(\alpha \leq 2\), the population mean is infinite. Similarly, the population standard deviation is infinite when \(\alpha \leq 3\). The Central Limit Theorem and Law of Large Numbers both assume a population mean is finite. When these or anything based on them – which is much of foundational statistics, including t-tests, confidence interval calculation, linear regression, ANOVA, and more – are naively applied to a divergent Power Law distribution, the results are by definition inaccurate: a core pre-condition is not being met.

Distribution Parameters

The point density function (PDF) of a Power Law distribution with rate parameter \(\alpha\) is:

\[f(x) \propto \alpha x^{-\alpha}\]

The PDF for a Log-Normal distribution with log mean parameter \(\mu\) and log standard deviation parameter \(\sigma\) is:

\[f(x) \propto \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}}\] As noted by Clauset et al, “In practice, few empirical phenomena obey power laws for all values of x. More often the power law applies only for values greater than some minimum \(x_{min}\). In such cases we say that the tail of the distribution follows a power law.” In other words, for the methodology to work, the distribution only has to eventually behave like a Power Law, at x values greater than \(x_{min}\). The distributions created here do use the \(x_{min}\) parameter.

For each distribution, concrete parameter values were randomly selected randomly from constrained intervals as follows:

  • \(x_{min}\): [10,100]
  • \(\alpha\): [2.1, 3.5], which does not include Power Law distributions where the population mean is infinite but does include ones where the standard deviation is infinite
  • \(\mu\): [log( \(x_{min}\))-1, log( \(x_{min}\))+2], which is equivalent to “near” \(x_{min}\) at log scale
  • \(\sigma\): [0.5, 1.5]

Power Law Distribution Generation

The following illustrates how a Power Law distribution is being generated using the poweRlaw package in this experiment: generate a \(x_{min}\) value, generate \(\alpha\), create a new continuous Power Law (conpl) object, set its parameters using the values just created, and then randomly sample it with n = 10,000 for all distributions. See the Clauset paper for experimental results with synthetic data set size of one million.

# Generate parameter values
rand_xmin <- sample(10:100, 1)
rand_alpha <- runif(n=1, min=2.1, max=3.5)   

# Set model parameters
obj <- conpl$new()
obj$setXmin(rand_xmin)
obj$setPars(rand_alpha)

# Generate a sample from the model
sample = dist_rand(obj, n_samples)
obj$setDat(sample)

Power Law and Log-Normal Visualization

The chart below illustrates how Log-Normal and Power Law distributions can resemble one another due to the Log-Normal’s skew to the right of the mode (peak density), especially when the sample size is low. In particular, note that the Log-Normal (L-N) distribution with parameters \(\mu\) = 1 and $$ = 0.5 closely resembles the arbitrarily selected Power Law one for low values of x. Left click to remove a group of Log-Normal distributions sharing the same mean.

Power Law v. Log-Normal PDF Behavior

In the slide immediately following, the linearly-binned PDFs of two similarly ranged distributions from the provided distribution set are compared – the first a Power Law distribution and the second a Log-Normal one.

At raw data scale, both distributions are similarly shaped: they both have an extreme peak at the left-hand side of the scale and a long tail at the right.

The difference between the two can be seen at log scale. For the Power Law distribution, the best estimator for the PDF is linear while for the Log-Normal one it is quadratic (a downward-facing parabola). Note that these best estimator assessments are a consequence of the generalized definitions for the two distribution types, not on the specifics of the data being visualized.

A “jagged” right hand tail is in evidence for both at log scale as well, where extreme values in both random samples appear. Binning is being performed to visualize PDFs but the binning itself is linear : this is what produces the “jagged” appearance at lot scale. The subsequent slide again examines the Power Law distribution’s log scale PDF but this time using both linear and logarithmic binning for comparison of the two methods. Here the Power Law’s fundamentally linear PDF behavior at log scale becomes more evident when (the more appropriate) logarithmic binning is employed.

The parameter i is the index of the distribution being visualized in the supplied distribution set.

Empirical Power Law and Log-Normal PDFs

Power Law Linear v Logarithmic Binning

Hypothesis Testing and Best Fitting Power Law Creation

During hypothesis testing, Clauset’s methodology does not assume \(x_{min}\) is known a priori. It sets the initial value of \(x_{min}\) using the Kolmogorov-Smirnov (K-S) statistic – the supremum (maximum) of the absolute difference between empirical and theoretical cumulative distribution function (CDF):

\[D = \sup_{x \geq x_{min}} |F_n(x) - F(x)|\]

Subsequently this initial value for \(x_{min}\) is refined using bootstrap resampling. Once \(x_{min}\) is estimated, a maximum likelihood estimator is used to estimate the rate parameter \(\alpha\). Together these values of \(x_{min}\) and \(\alpha\) define the Power Law distribution which best fits the empirical data – regardless of how the empircal data was generated or what distribution it actually came from or that distributions parameter values.

Finally a p-value is calculated: is there a difference between the empirical data and this best fitting Power Law distribution?

It is this phase of analysis that is the most processor intensive. By default, the poweRlaw packaged calculates the K-S statistic – meaning the CDF of both the empirical and theoretical distributions – for every point in the empirical data set. This is to produce the initial value of \(x_{min}\), which is essential to both further refinement through bootstrap resampling and estimation of the rate parameter \(\alpha\).

In this experiment, the search space for the initial \(x_{min}\) estimate was limited a vector of one hundred logarithmically-spaced candidate values, all between log(min(x)) and log(median(x)) – and not for every empirical data point. Even with this constrained search space and parallel processing, generation of the p-values for for hypothesis testing took in excess of of an hour to perform with twenty-three thread parallel processing on an Intel i9 CPU.

Binary Classification Confusion Matrix

Below is the performance of the Clauset methodology when used as a binary classifier on the experimental distribution test set.

Reference Power Law Reference Log-Normal
Prediction Power Law 29 0
Prediction Log-Normal 1 30
           Accuracy : 0.9833          
             95% CI : (0.9106, 0.9996)
No Information Rate : 0.5             
P-Value [Acc > NIR] : <2e-16
              Kappa : 0.9667
        Sensitivity : 0.9667          
        Specificity : 1.0000          
     Pos Pred Value : 1.0000          
     Neg Pred Value : 0.9677  
 
        

Conclusion

In this experiment, the Clauset framework implemented in the R poweRlaw package proved an effective (though imperfect) binary classifier in the identification of Power Law distributions versus similar Log-Normal ones, with an accuracy of 0.9833 and a p-value that the accuracy is greater than the no information rate of less that 2e-16. There was one false negative where the null hypothesis was rejected despite the fact the distribution was in fact generated by a Power Law. This was the only error seen in this particular test set, though.

Such results cannot be generalized: they apply to this particular set of generated distributions. Not only do the random parameter values used affect the results, so do the constraints on those parameters as well as the subsequent random sampling of data points X. However it is a positive indicator that at least in this experiment, the Clauset methodology performed well and as intended.