We had discussed HPC race previously. See: https://rpubs.com/alex-lev/693131, https://rpubs.com/alex-lev/553777
Data for November 2020 can be downloaded here: https://www.top500.org/lists/top500/2020/11/
Note: the original names of variables were changed (truncated, concatenated and simplified) for the purpose of data exploration and visualization as much as possible.
library(readxl)
library(tidyverse)
library(qcc)
library(DT)
TOP500_202011 <- read_excel("top500/TOP500_202011.xlsx")
names(TOP500_202011)
## [1] "Rank" "PreviousRank"
## [3] "FirstAppearance" "FirstRank"
## [5] "Name" "Computer"
## [7] "Site" "Manufacturer"
## [9] "Country" "Year"
## [11] "Segment" "TotalCores"
## [13] "AcceleratorCoProcessorCores" "Rmax"
## [15] "Rpeak" "Nmax"
## [17] "Nhalf" "HPCG"
## [19] "Power" "PowerSource"
## [21] "PowerEfficiency" "Architecture"
## [23] "Processor" "ProcessorTechnology"
## [25] "ProcessorSpeed" "OperatingSystem"
## [27] "OSFamily" "AcceleratorCoProcessor"
## [29] "CoresperSocket" "ProcessorGeneration"
## [31] "SystemModel" "SystemFamily"
## [33] "InterconnectFamily" "Interconnect"
## [35] "Continent" "SiteID"
## [37] "SystemID"
TOP500_202011.tbl <-as_tibble(TOP500_202011)
The Pareto principle states that for many outcomes roughly 80% of consequences come from 20% of the causes (the vital few). For more see: https://en.wikipedia.org/wiki/Pareto_principle
TOP500 <- TOP500_202011 %>% count(Country) %>% mutate(Mainframes=n) %>%na.omit() %>% arrange(desc(Mainframes))
pareto.chart(as.vector(TOP500$Mainframes),names=TOP500$Country, main="Pareto chart for TOP500 countries by HPC mainframes")
##
## Pareto chart analysis for as.vector(TOP500$Mainframes)
## Frequency Cum.Freq. Percentage Cum.Percent.
## A 213.0 213.0 42.6 42.6
## B 113.0 326.0 22.6 65.2
## C 34.0 360.0 6.8 72.0
## D 18.0 378.0 3.6 75.6
## E 18.0 396.0 3.6 79.2
## F 15.0 411.0 3.0 82.2
## G 14.0 425.0 2.8 85.0
## H 12.0 437.0 2.4 87.4
## I 12.0 449.0 2.4 89.8
## J 6.0 455.0 1.2 91.0
## K 5.0 460.0 1.0 92.0
## L 4.0 464.0 0.8 92.8
## M 4.0 468.0 0.8 93.6
## N 3.0 471.0 0.6 94.2
## O 3.0 474.0 0.6 94.8
## P 3.0 477.0 0.6 95.4
## Q 3.0 480.0 0.6 96.0
## R 3.0 483.0 0.6 96.6
## S 2.0 485.0 0.4 97.0
## T 2.0 487.0 0.4 97.4
## U 2.0 489.0 0.4 97.8
## V 2.0 491.0 0.4 98.2
## W 2.0 493.0 0.4 98.6
## X 2.0 495.0 0.4 99.0
## Y 1.0 496.0 0.2 99.2
## Z 1.0 497.0 0.2 99.4
## A1 1.0 498.0 0.2 99.6
## B1 1.0 499.0 0.2 99.8
## C1 1.0 500.0 0.2 100.0
TOP500_202011.tbl %>%
count(Country) %>%
mutate(Mainframes=n, Percent=(n/500)*100) %>%
select(Country,Mainframes,Percent)%>%
arrange(desc(Mainframes),Percent) %>% top_n(.,6) %>% datatable()
## Selecting by Percent
Conclusion
1. We proved Pareto Principle (Rule of 20/80) by TOP500 HPC data.
2. Six leading countries (China, United States, Japan, France, Germany, Netherlands) comprise 80% of TOP500 HPC mainframes in the world.
3. The race is not over!