1 Introduction

We had discussed HPC race previously. See: https://rpubs.com/alex-lev/693131, https://rpubs.com/alex-lev/553777

2 TOP500 data

Data for November 2020 can be downloaded here: https://www.top500.org/lists/top500/2020/11/

3 Preparing and Exploring Data

Note: the original names of variables were changed (truncated, concatenated and simplified) for the purpose of data exploration and visualization as much as possible.

library(readxl)
library(tidyverse)
library(qcc)
library(DT)



TOP500_202011 <- read_excel("top500/TOP500_202011.xlsx")
names(TOP500_202011)
##  [1] "Rank"                        "PreviousRank"               
##  [3] "FirstAppearance"             "FirstRank"                  
##  [5] "Name"                        "Computer"                   
##  [7] "Site"                        "Manufacturer"               
##  [9] "Country"                     "Year"                       
## [11] "Segment"                     "TotalCores"                 
## [13] "AcceleratorCoProcessorCores" "Rmax"                       
## [15] "Rpeak"                       "Nmax"                       
## [17] "Nhalf"                       "HPCG"                       
## [19] "Power"                       "PowerSource"                
## [21] "PowerEfficiency"             "Architecture"               
## [23] "Processor"                   "ProcessorTechnology"        
## [25] "ProcessorSpeed"              "OperatingSystem"            
## [27] "OSFamily"                    "AcceleratorCoProcessor"     
## [29] "CoresperSocket"              "ProcessorGeneration"        
## [31] "SystemModel"                 "SystemFamily"               
## [33] "InterconnectFamily"          "Interconnect"               
## [35] "Continent"                   "SiteID"                     
## [37] "SystemID"
TOP500_202011.tbl <-as_tibble(TOP500_202011)

4 Pareto principle and chart

The Pareto principle states that for many outcomes roughly 80% of consequences come from 20% of the causes (the vital few). For more see: https://en.wikipedia.org/wiki/Pareto_principle

TOP500 <- TOP500_202011 %>% count(Country) %>% mutate(Mainframes=n) %>%na.omit() %>%  arrange(desc(Mainframes))
pareto.chart(as.vector(TOP500$Mainframes),names=TOP500$Country, main="Pareto chart for TOP500 countries by HPC mainframes")

##     
## Pareto chart analysis for as.vector(TOP500$Mainframes)
##      Frequency Cum.Freq. Percentage Cum.Percent.
##   A      213.0     213.0       42.6         42.6
##   B      113.0     326.0       22.6         65.2
##   C       34.0     360.0        6.8         72.0
##   D       18.0     378.0        3.6         75.6
##   E       18.0     396.0        3.6         79.2
##   F       15.0     411.0        3.0         82.2
##   G       14.0     425.0        2.8         85.0
##   H       12.0     437.0        2.4         87.4
##   I       12.0     449.0        2.4         89.8
##   J        6.0     455.0        1.2         91.0
##   K        5.0     460.0        1.0         92.0
##   L        4.0     464.0        0.8         92.8
##   M        4.0     468.0        0.8         93.6
##   N        3.0     471.0        0.6         94.2
##   O        3.0     474.0        0.6         94.8
##   P        3.0     477.0        0.6         95.4
##   Q        3.0     480.0        0.6         96.0
##   R        3.0     483.0        0.6         96.6
##   S        2.0     485.0        0.4         97.0
##   T        2.0     487.0        0.4         97.4
##   U        2.0     489.0        0.4         97.8
##   V        2.0     491.0        0.4         98.2
##   W        2.0     493.0        0.4         98.6
##   X        2.0     495.0        0.4         99.0
##   Y        1.0     496.0        0.2         99.2
##   Z        1.0     497.0        0.2         99.4
##   A1       1.0     498.0        0.2         99.6
##   B1       1.0     499.0        0.2         99.8
##   C1       1.0     500.0        0.2        100.0
TOP500_202011.tbl %>%
  count(Country) %>%
  mutate(Mainframes=n, Percent=(n/500)*100) %>%
  select(Country,Mainframes,Percent)%>%
  arrange(desc(Mainframes),Percent) %>%  top_n(.,6)  %>% datatable()
## Selecting by Percent

Conclusion
1. We proved Pareto Principle (Rule of 20/80) by TOP500 HPC data.
2. Six leading countries (China, United States, Japan, France, Germany, Netherlands) comprise 80% of TOP500 HPC mainframes in the world.
3. The race is not over!