Statistics and High Performance Computing

Ronald Wesonga (PhD)

10/6/2022

Why statistics requires high performance computing?

Besides numerous reasons, such as; efficiency of statistical algorithms, accuracy of results, replicability of studies and reproducibility of results (Wesonga et al. 2019), below I will briefly describe three:

  1. An almost parallel development of statistics theory and computing tools
  2. Definitional reasons
  3. Sampling and Central Limit Theory

Development of Computing and Statistics

The developers of mechanical calculating machines have been in a way or the other closely associated to statistics (Sint 1984; Grossmann, Schimek, and Sint 2004). Brief outline: (Wilson 2015, 2001; Rojas-Sola et al. 2021; Kistermann 1998; Coale, Demeny, and Vaughan 2013; Coale and Trussell 1996)

Result 1

If \(X_1, X_2, \cdots,X_n\) are independent identically distributed random variables (Data), and \[Statistics \propto f(Data, Computing)\] Computing seems to be lagged statistics. That is, the growth of computing power is highly correlated with the growth of statistics.

Definition of Statistics and Computing

Statistics is derived from the Italian/Latin to describe the knowledge of statesmanship or state affairs. “Statist” in Shakespears Hamlet and Cymbeline was used for that purpose. However, Gabriel Naude (1639) was the first to use the adjective statisticus in print (Pearson 1978 p.4). Subsequently, Leibnitz, Hermann Conring (1606-1681) founded “statistics” as the art of collecting relevant data for decision making by government. Gottfried Anchenwell (1719-1772) though widely recognised as the founder of statistics, himself knew he just extended Conring’s ideas, but appreciated mathematical methods and information derived from the collected data. Therefore, plainly speaking statistics is the process of planning, collecting, processing, analysing and interpretation of data for knowledge discovery. Similarly, computing is the art of processing input data to generate output information (Sint 1984; Grossmann, Schimek, and Sint 2004).

Result 2

If \(X_1, X_2, \cdots,X_n\) are independent identically distributed random variables, and \[Statistics = Theory + Data + Computing\] Thus, Computing is one of the Attributes of Statistics

Sampling Theory and Central Limit Theorem

CLT is a popular statistical concept: (Le Cam 1986; Johnson 2004; Fischer 2011)

Definition of CLT

Let \(X_1, X_2, \cdots,X_n\) be independent identically distributed random variables with mean \(\mu\) and variance \(\sigma^2\) both finite. Then, for any constant \(z\) \[lim_{n\rightarrow\infty}P\left(\frac{\bar{X}-\mu}{\frac{\sigma}{\sqrt{n}}}\le z\right)=\Phi(z)\] where \(\Phi\) is the cdf of the standard normal distribution.

Demonstrating the large sample

Result 3

For a given set of parameters, belonging to some distribution \(X_1, X_2, \cdots,X_p \sim D(\mu, \Sigma)\) then \[D(\mu, \Sigma)_{n\rightarrow \infty} \longrightarrow N(\mu, \Sigma)\] Implication Computing Power is Necessary to Achieve Normality.

References

Coale, Ansley J, Paul Demeny, and Barbara Vaughan. 2013. Regional Model Life Tables and Stable Populations: Studies in Population. Elsevier.

Coale, Ansley, and James Trussell. 1996. “The Development and Use of Demographic Models.” Population Studies 50 (3): 469–84.

Fischer, Hans. 2011. A History of the Central Limit Theorem: From Classical to Modern Probability Theory. Springer.

Grossmann, Wilfried, Michael G Schimek, and Peter Paul Sint. 2004. “The History of Compstat and Key-Steps of Statistical Computing During the Last 30 Years.” In COMPSTAT 2004—Proceedings in Computational Statistics, 1–35. Springer.

Johnson, Oliver. 2004. Information Theory and the Central Limit Theorem. World Scientific.

Kistermann, Friedrich W. 1998. “Blaise Pascal’s Adding Machine: New Findings and Conclusions.” IEEE Annals of the History of Computing 20 (1): 69–76.

Le Cam, Lucien. 1986. “The Central Limit Theorem Around 1935.” Statistical Science, 78–91.

Rojas-Sola, José Ignacio, Gloria del Rı́o-Cidoncha, Arturo Fernández-de la Puente Sarriá, and Verónica Galiano-Delgado. 2021. “Blaise Pascal’s Mechanical Calculator: Geometric Modelling and Virtual Reconstruction.” Machines 9 (7): 136.

Sint, P. 1984. “Roots of Computational Statistics.” In Compstat 1984, 9–20. Springer.

Wesonga, Ronald, Fabian Nabugoomu, Faisal Ababneh, and Abraham Owino. 2019. “Simulation of Time Series Wind Speed at an International Airport.” Simulation 95 (2): 171–84.

Wilson, Robin. 2015. “Calculation.” Springer.

Wilson, Robin J. 2001. “The Birth of Computing.” Stamping Through Mathematics, 92–93.