The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic R.A.Fisher

1 Introduction

Supercomputers have made a great leap in the race for speed for the last 20 years: http://en.wikipedia.org/wiki/Supercomputer. But is there any interesting thing about these monsters except the speed of calculations? Let’s see!

2 Data

First we get file with top500 data - http://www.top500.org/lists/2014/06/.

2.1 Top 500 structure

Then put file into working directory, read file and view file structure.

if (Sys.getenv("JAVA_HOME")!="")
+     Sys.setenv(JAVA_HOME="")

## [1] 1

library(xlsx)

## Loading required package: rJava
## Loading required package: xlsxjars

library(psych)
library(ggplot2)

## 
## Attaching package: 'ggplot2'
## 
## The following object is masked from 'package:psych':
## 
##     %+%

library(rpart)
library(party)

## Loading required package: grid
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: sandwich
## Loading required package: strucchange
## Loading required package: modeltools
## Loading required package: stats4
## 
## Attaching package: 'modeltools'
## 
## The following object is masked from 'package:rJava':
## 
##     clone

library(randomForest)

## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## 
## The following object is masked from 'package:psych':
## 
##     outlier

library(MASS)
library(FactoMineR)
library(dplyr)

## Warning: package 'dplyr' was built under R version 3.1.3

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:MASS':
## 
##     select
## 
## The following object is masked from 'package:randomForest':
## 
##     combine
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(arules)

## Warning: package 'arules' was built under R version 3.1.3

## Loading required package: Matrix
## 
## Attaching package: 'arules'
## 
## The following object is masked from 'package:modeltools':
## 
##     info
## 
## The following objects are masked from 'package:base':
## 
##     %in%, write

library(arulesViz)

## Warning: package 'arulesViz' was built under R version 3.1.3

## 
## Attaching package: 'arulesViz'
## 
## The following object is masked from 'package:base':
## 
##     abbreviate

top500<-read.xlsx("./top500/TOP500_201406.xls",1)

attach(top500)
str(top500)

## 'data.frame':    500 obs. of  34 variables:
##  $ Rank                          : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Previous.Rank                 : num  1 2 3 4 5 6 7 8 9 NA ...
##  $ First.Appearance              : num  41 40 38 37 39 40 40 39 39 43 ...
##  $ First.Rank                    : num  1 1 17 1 3 114 7 8 48 10 ...
##  $ Name                          : Factor w/ 175 levels "Abel","Ada","airain",..: 157 159 136 NA 105 122 145 88 167 NA ...
##  $ Computer                      : Factor w/ 258 levels "Acer AR585 F1 Cluster, Opteron 12C 2.2GHz, QDR infiniband",..: 231 138 11 190 13 124 205 12 12 121 ...
##  $ Site                          : Factor w/ 220 levels "Academic Center for Computing and Media Studies (ACCMS),  Kyoto University",..: 151 54 51 170 52 188 194 77 51 82 ...
##  $ Manufacturer                  : Factor w/ 32 levels "Acer Group","Adtech",..: 26 7 16 11 16 7 9 16 16 7 ...
##  $ Country                       : Factor w/ 29 levels "Australia","Austria",..: 6 29 29 16 29 26 29 10 29 29 ...
##  $ Year                          : num  2013 2012 2011 2011 2012 ...
##  $ Segment                       : Factor w/ 6 levels "Academic","Government",..: 5 5 5 5 5 5 1 5 5 2 ...
##  $ Total.Cores                   : num  3120000 560640 1572864 705024 786432 ...
##  $ Accelerator.Co.Processor.Cores: num  2736000 261632 0 0 0 ...
##  $ Rmax                          : num  33862700 17590000 17173224 10510000 8586612 ...
##  $ Rpeak                         : num  54902400 27112550 20132659 11280384 10066330 ...
##  $ Nmax                          : num  9960000 0 0 11870208 0 ...
##  $ Nhalf                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Power                         : num  17808 8209 7890 12660 3945 ...
##  $ Mflops.Watt                   : num  1902 2143 2177 830 2177 ...
##  $ Architecture                  : Factor w/ 2 levels "Cluster","MPP": 1 2 2 1 2 2 1 2 2 2 ...
##  $ Processor                     : Factor w/ 65 levels "Intel Xeon E5-2450v2 8C 2.5GHz",..: 12 23 27 33 27 46 47 27 27 14 ...
##  $ Processor.Technology          : Factor w/ 10 levels "AMD x86_64","Intel Core",..: 4 1 9 10 9 6 6 9 9 4 ...
##  $ Processor.Speed..MHz.         : num  2200 2200 1600 2000 1600 2600 2700 1600 1600 2700 ...
##  $ Operating.System              : Factor w/ 22 levels "AIX","Bullx Linux",..: 9 8 10 10 10 8 10 10 10 8 ...
##  $ OS.Family                     : Factor w/ 4 levels "Linux","Mixed",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Accelerator.Co.Processor      : Factor w/ 23 levels "AMD FirePro S10000",..: 5 22 14 14 14 22 11 14 14 14 ...
##  $ Cores.per.Socket              : num  12 16 16 8 16 8 8 16 16 12 ...
##  $ Processor.Generation          : Factor w/ 17 levels "Intel Xeon E3 (Haswell)",..: 2 6 8 13 8 3 3 8 8 2 ...
##  $ System.Model                  : Factor w/ 113 levels "Acer AR585 F1 Cluster",..: 103 48 9 72 9 45 84 9 9 45 ...
##  $ System.Family                 : Factor w/ 52 levels "Acer Group Cluster",..: 50 11 25 17 25 9 14 25 25 9 ...
##  $ Interconnect.Family           : Factor w/ 7 levels "10G","Cray Interconnect",..: 3 2 3 3 3 3 5 3 3 3 ...
##  $ Interconnect                  : Factor w/ 19 levels "10G Ethernet",..: 17 3 4 4 4 2 10 4 4 2 ...
##  $ Region                        : Factor w/ 11 levels "Australia and New Zealand",..: 2 4 4 2 4 11 4 11 4 4 ...
##  $ Continent                     : Factor w/ 4 levels "Americas","Asia",..: 2 1 1 2 1 3 1 3 1 1 ...

head(top500,3)

##   Rank Previous.Rank First.Appearance First.Rank                  Name
## 1    1             1               41          1 Tianhe-2 (MilkyWay-2)
## 2    2             2               40          1                 Titan
## 3    3             3               38         17               Sequoia
##                                                                                  Computer
## 1 TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
## 2             Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
## 3                                              BlueGene/Q, Power BQC 16C 1.60 GHz, Custom
##                                          Site Manufacturer       Country
## 1 National Super Computer Center in Guangzhou         NUDT         China
## 2        DOE/SC/Oak Ridge National Laboratory    Cray Inc. United States
## 3                               DOE/NNSA/LLNL          IBM United States
##   Year  Segment Total.Cores Accelerator.Co.Processor.Cores     Rmax
## 1 2013 Research     3120000                        2736000 33862700
## 2 2012 Research      560640                         261632 17590000
## 3 2011 Research     1572864                              0 17173224
##      Rpeak    Nmax Nhalf Power Mflops.Watt Architecture
## 1 54902400 9960000     0 17808     1901.54      Cluster
## 2 27112550       0     0  8209     2142.77          MPP
## 3 20132659       0     0  7890     2176.58          MPP
##                         Processor Processor.Technology
## 1 Intel Xeon E5-2692v2 12C 2.2GHz      Intel IvyBridge
## 2         Opteron 6274 16C 2.2GHz           AMD x86_64
## 3            Power BQC 16C 1.6GHz              PowerPC
##   Processor.Speed..MHz.        Operating.System OS.Family
## 1                  2200             Kylin Linux     Linux
## 2                  2200 Cray Linux Environment      Linux
## 3                  1600                   Linux     Linux
##   Accelerator.Co.Processor Cores.per.Socket
## 1     Intel Xeon Phi 31S1P               12
## 2              NVIDIA K20x               16
## 3                     None               16
##               Processor.Generation       System.Model  System.Family
## 1        Intel Xeon E5 (IvyBridge) TH-IVB-FEP Cluster TH-IVB Cluster
## 2 Opteron 6200 Series "Interlagos"          Cray XK7         Cray XK
## 3                        Power BQC         BlueGene/Q   IBM BlueGene
##   Interconnect.Family             Interconnect        Region Continent
## 1 Custom Interconnect             TH Express-2  Eastern Asia      Asia
## 2   Cray Interconnect Cray Gemini interconnect North America  Americas
## 3 Custom Interconnect      Custom Interconnect North America  Americas

2.2 Preparing data for analysis

We need some new variable for our new data set top500.1: Effectiveness - \(Eff=Rmax/Rpeak\). View histogram for Eff - it has skew to the right tail (negative skew). Good news for effectiveness.

Eff<-c(Rmax/Rpeak)
top500.1<-cbind(top500,Eff)
par(mfrow=c(1,1))

boxplot(Eff,horizontal = T,col="grey")

hist(Eff,col="grey")

plot(density(Eff))
polygon(density(Eff),col = "grey")

       
grid()

summary(Eff)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2516  0.5097  0.7160  0.6844  0.8530  0.9980

describe(Eff)

##   vars   n mean   sd median trimmed  mad  min max range skew kurtosis   se
## 1    1 500 0.68 0.21   0.72    0.71 0.23 0.25   1  0.75 -0.7    -0.65 0.01

3 Preliminary analysis

Is there any difference in Eff by Segment , Manufacturer, Continent, Region and *Country ?

3.1 Boxplot and Densityplot

We use boxplot and densityplot to answer this question. Industry is low on the whole.

3.1.1 Segment

ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment,))+geom_jitter()+geom_boxplot()+
theme(axis.text.x = element_text(size=10, angle=90))

ggplot(top500.1, aes(Eff,fill=Segment)) +  geom_density(alpha = 0.2)

3.1.2 Manufacturer

newdata <- subset(top500.1,  Manufacturer=="SGI" | Manufacturer=="IBM" | Manufacturer=="Cray Inc." | Manufacturer=="Hewlett-Packard" | Manufacturer=="Dell" | Manufacturer=="Bull SA")

ggplot(newdata,aes(x=Manufacturer,y=Eff,fill=Manufacturer,))+geom_jitter()+geom_boxplot()+ theme(axis.text.x = element_text(size=10, angle=90))

ggplot(newdata, aes(Eff,fill=Manufacturer)) +  geom_density(alpha = 0.2)

3.1.3 Architecture

ggplot(top500.1,aes(x=Architecture,y=Eff,fill=Architecture,))+geom_jitter()+geom_boxplot()+ theme(axis.text.x = element_text(size=10, angle=90))

ggplot(top500.1, aes(Eff,fill=Architecture)) +  geom_density(alpha = 0.2)

3.1.4 Continent

ggplot(top500.1,aes(x=Continent,y=Eff,fill=Continent,))+geom_jitter()+geom_boxplot()+
theme(axis.text.x = element_text(size=10, angle=90))

ggplot(top500.1, aes(Eff,fill=Continent)) +  geom_density(alpha = 0.2)

3.1.5 Region

ggplot(top500.1,aes(x=Region,y=Eff,fill=Region,))+geom_jitter()+geom_boxplot()+
theme(axis.text.x = element_text(size=10, angle=90))

ggplot(top500.1, aes(Eff,fill=Region)) +  geom_density(alpha = 0.2)

3.1.6 Country

newdata2 <- subset(top500.1,  Country=="United States" | Country=="Germany" | Country=="Japan" | Country=="United Kingdom" | Country=="China" | Country=="France" | Country=="Russia")

ggplot(newdata2,aes(x=Country,y=Eff,fill=Country,))+geom_jitter()+geom_boxplot()+ theme(axis.text.x = element_text(size=10, angle=90))

ggplot(newdata2, aes(Eff,fill=Country)) +  geom_density(alpha = 0.2)

3.2 Complex distributions

And what if we want to view the distribution of Eff by Segment as a density using a “violin plot”?

ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment,))+geom_jitter()+geom_violin()

ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment))+geom_jitter()+geom_violin() +facet_wrap(~Region)+
theme(axis.text.x = element_text(size=10, angle=90))

ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment))+geom_jitter()+geom_violin() +facet_wrap(~Architecture) + theme(axis.text.x = element_text(size=14, angle=90))

ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment))+geom_jitter()+geom_violin() +facet_wrap(~Processor.Technology) + theme(axis.text.x = element_text(size=10, angle=90))

3.2.1 Note

We get very interesting result especially for Region, Architecture and Processor.Technology.

4 Fitting models

4.1 ANOVA

Is this difference statistically significant not only for Segment? Does ANOVA know the answer?

aov.out=aov(Eff~Country+Segment+Processor.Technology+OS.Family*Architecture,data = top500.1)
summary(aov.out)

##                         Df Sum Sq Mean Sq F value   Pr(>F)    
## Country                 28  9.852  0.3519  19.186  < 2e-16 ***
## Segment                  5  2.732  0.5464  29.794  < 2e-16 ***
## Processor.Technology     9  0.527  0.0586   3.193 0.000926 ***
## OS.Family                3  0.188  0.0627   3.418 0.017338 *  
## Architecture             1  0.001  0.0006   0.034 0.853158    
## OS.Family:Architecture   1  0.184  0.1843  10.052 0.001625 ** 
## Residuals              452  8.289  0.0183                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coef(aov.out)

##                           (Intercept) 
##                          0.7186661783 
##                        CountryAustria 
##                          0.1176344086 
##                        CountryBelgium 
##                          0.0696953899 
##                         CountryBrazil 
##                         -0.0731036643 
##                         CountryCanada 
##                          0.0683843161 
##                          CountryChina 
##                         -0.3398848251 
##                        CountryDenmark 
##                          0.2363785142 
##                        CountryFinland 
##                          0.0562642443 
##                         CountryFrance 
##                          0.0779536007 
##                        CountryGermany 
##                          0.0567875033 
##                      CountryHong Kong 
##                         -0.1899429427 
##                          CountryIndia 
##                          0.0337653908 
##                        CountryIreland 
##                          0.0907879755 
##                         CountryIsrael 
##                         -0.1939172646 
##                          CountryItaly 
##                          0.0119032757 
##                          CountryJapan 
##                         -0.0485267470 
##                   CountryKorea, South 
##                         -0.0603637416 
##                       CountryMalaysia 
##                         -0.2036983240 
##                    CountryNetherlands 
##                          0.0030047763 
##                         CountryNorway 
##                          0.0538293092 
##                         CountryPoland 
##                          0.0067512792 
##                         CountryRussia 
##                         -0.1363252827 
##                   CountrySaudi Arabia 
##                          0.0080038221 
##                          CountrySpain 
##                          0.0431682418 
##                         CountrySweden 
##                          0.0548338403 
##                    CountrySwitzerland 
##                         -0.0137604755 
##                         CountryTaiwan 
##                         -0.0001472943 
##                 CountryUnited Kingdom 
##                         -0.0028473396 
##                  CountryUnited States 
##                         -0.0396982916 
##                     SegmentGovernment 
##                          0.0607195503 
##                       SegmentIndustry 
##                         -0.1098267980 
##                         SegmentOthers 
##                         -0.0390887545 
##                       SegmentResearch 
##                          0.0453074070 
##                         SegmentVendor 
##                          0.1262483003 
##        Processor.TechnologyIntel Core 
##                          0.0029070959 
##     Processor.TechnologyIntel Haswell 
##                         -0.3138411273 
##   Processor.TechnologyIntel IvyBridge 
##                          0.0976988342 
##     Processor.TechnologyIntel Nehalem 
##                          0.0373042706 
## Processor.TechnologyIntel SandyBridge 
##                          0.0722636536 
##            Processor.TechnologyOthers 
##                          0.3196318050 
##             Processor.TechnologyPower 
##                         -0.0622529158 
##           Processor.TechnologyPowerPC 
##                          0.0906852318 
##             Processor.TechnologySparc 
##                          0.2374027685 
##                        OS.FamilyMixed 
##                          0.1561183131 
##                         OS.FamilyUnix 
##                          0.3625192139 
##                      OS.FamilyWindows 
##                          0.3050431844 
##                       ArchitectureMPP 
##                          0.0362007293 
##         OS.FamilyUnix:ArchitectureMPP 
##                         -0.2821314759

par(mfrow=c(2,2))
plot(aov.out)

## Warning: not plotting observations with leverage one:
##   22, 46, 270, 303, 423, 476

## Warning: not plotting observations with leverage one:
##   22, 46, 270, 303, 423, 476

## Warning in sqrt(crit * p * (1 - hh)/hh): qngd`m{ NaN

## Warning in sqrt(crit * p * (1 - hh)/hh): qngd`m{ NaN

par(mfrow=c(1,1))

4.1.1 Note

ANOVA shows good fit for residuals.

ANOVA says YES: not only Segment but Country, Processor.Technology and OS.Family combined with Architecture pay their tribute in this difference.

4.2 Classifications tree

Can we see it in details? Let’s try classification tree.

tree.fit<-rpart(Eff~Country+Segment+Processor.Technology+OS.Family+Architecture,data = top500.1)
tree.fit

## n= 500 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##  1) root 500 21.7735000 0.6843975  
##    2) Country=China,Hong Kong,Israel,Malaysia 80  2.5276620 0.3908875  
##      4) Segment=Industry 66  0.8228633 0.3301390 *
##      5) Segment=Academic,Research 14  0.3129967 0.6772734 *
##    3) Country=Australia,Austria,Belgium,Brazil,Canada,Denmark,Finland,France,Germany,India,Ireland,Italy,Japan,Korea, South,Netherlands,Norway,Poland,Russia,Saudi Arabia,Spain,Sweden,Switzerland,Taiwan,United Kingdom,United States 420 11.0412500 0.7403042  
##      6) Segment=Industry,Others 212  6.1326970 0.6661437  
##       12) Country=Brazil,Korea, South,Netherlands,Russia,Sweden,United Kingdom,United States 174  4.7582110 0.6347066  
##         24) Processor.Technology=AMD x86_64,Intel Haswell,Intel Nehalem 17  0.1885261 0.4996429 *
##         25) Processor.Technology=Intel IvyBridge,Intel SandyBridge 157  4.2259880 0.6493313 *
##       13) Country=Belgium,Canada,Denmark,France,Germany,India,Italy,Japan,Saudi Arabia 38  0.4151134 0.8100927 *
##      7) Segment=Academic,Government,Research,Vendor 208  2.5542270 0.8158909 *

plotcp(tree.fit)

plot(tree.fit)
text(tree.fit,use.n=TRUE, all=TRUE, cex=.5)

4.2.1 China

tree.fit2 <- ctree(Eff~Segment+Processor.Technology+OS.Family,data = top500.1,subset = Country=="China")
tree.fit2

## 
##   Conditional inference tree with 3 terminal nodes
## 
## Response:  Eff 
## Inputs:  Segment, Processor.Technology, OS.Family 
## Number of observations:  76 
## 
## 1) Segment == {Academic, Research}; criterion = 1, statistic = 46.061
##   2)*  weights = 14 
## 1) Segment == {Industry}
##   3) Processor.Technology == {Intel Nehalem, Intel SandyBridge}; criterion = 0.993, statistic = 12.052
##     4)*  weights = 50 
##   3) Processor.Technology == {Intel IvyBridge}
##     5)*  weights = 12

plot(tree.fit2)

4.2.2 United states

tree.fit3 <- ctree(Eff~Segment+Processor.Technology+OS.Family,data = top500.1,subset = Country=="United States")
tree.fit3

## 
##   Conditional inference tree with 4 terminal nodes
## 
## Response:  Eff 
## Inputs:  Segment, Processor.Technology, OS.Family 
## Number of observations:  232 
## 
## 1) Segment == {Academic, Government, Research, Vendor}; criterion = 1, statistic = 55.151
##   2)*  weights = 80 
## 1) Segment == {Industry}
##   3) Processor.Technology == {Intel IvyBridge, Intel SandyBridge}; criterion = 0.999, statistic = 21.042
##     4) Processor.Technology == {Intel IvyBridge}; criterion = 0.967, statistic = 6.431
##       5)*  weights = 18 
##     4) Processor.Technology == {Intel SandyBridge}
##       6)*  weights = 118 
##   3) Processor.Technology == {AMD x86_64, Intel Haswell, Intel Nehalem}
##     7)*  weights = 16

plot(tree.fit3)

4.2.3 Note

China, Hong Kong, Israel, Malaysia are splitted from the root of tree. China as outsider has the most big values for \(Eff\) in Academic and Research while United States as a leader has the most big values for \(Eff\) in Academic, Government, Research and Vendor segments.

4.3 PCA

What numerical factors are correlated and what are not? PCA would help us.

nums <- sapply(top500.1, is.numeric)
top500.1.num<-top500.1[,nums]
par(mfrow=c(1,1))
pca.out=PCA(top500.1.num)

pca.out$eig

##          eigenvalue percentage of variance
## comp 1  5.430303085           33.939394282
## comp 2  3.041979227           19.012370166
## comp 3  1.977083753           12.356773459
## comp 4  1.382239477            8.638996732
## comp 5  0.963621166            6.022632289
## comp 6  0.902337198            5.639607490
## comp 7  0.642724766            4.017029788
## comp 8  0.478067739            2.987923366
## comp 9  0.388568956            2.428555975
## comp 10 0.305297941            1.908112131
## comp 11 0.247979547            1.549872168
## comp 12 0.081532303            0.509576893
## comp 13 0.068369005            0.427306278
## comp 14 0.054474229            0.340463929
## comp 15 0.034206336            0.213789599
## comp 16 0.001215273            0.007595455
##         cumulative percentage of variance
## comp 1                           33.93939
## comp 2                           52.95176
## comp 3                           65.30854
## comp 4                           73.94753
## comp 5                           79.97017
## comp 6                           85.60977
## comp 7                           89.62680
## comp 8                           92.61473
## comp 9                           95.04328
## comp 10                          96.95140
## comp 11                          98.50127
## comp 12                          99.01084
## comp 13                          99.43815
## comp 14                          99.77861
## comp 15                          99.99240
## comp 16                         100.00000

pca.out$var

## $coord
##                                     Dim.1       Dim.2       Dim.3
## Rank                           -0.5947605  0.51585716 -0.29570230
## Previous.Rank                  -0.5841319  0.46071795 -0.28817963
## First.Appearance               -0.3186119  0.61837832  0.54298393
## First.Rank                     -0.5606208  0.68297240  0.05489210
## Year                           -0.3184208  0.62075695  0.54214733
## Total.Cores                     0.8696188  0.42389651 -0.04809639
## Accelerator.Co.Processor.Cores  0.7264084  0.49220188 -0.12196963
## Rmax                            0.8918226  0.39582170 -0.01637216
## Rpeak                           0.8723756  0.44995513 -0.04556264
## Nmax                            0.6201017 -0.24627256 -0.20372160
## Nhalf                           0.0917428 -0.23884429 -0.17224974
## Power                           0.8337275  0.35274263 -0.20967665
## Mflops.Watt                     0.3119978 -0.16360149  0.66236746
## Processor.Speed..MHz.          -0.1937613  0.08094431 -0.40989259
## Cores.per.Socket                0.3027854 -0.14854002  0.66160564
## Eff                             0.2514955 -0.50083557  0.18892525
##                                       Dim.4        Dim.5
## Rank                           -0.398515784  0.174142734
## Previous.Rank                  -0.376589967  0.155087241
## First.Appearance                0.411871593  0.074663621
## First.Rank                     -0.069088912  0.155662668
## Year                            0.419965362  0.053582387
## Total.Cores                    -0.093073279  0.013544715
## Accelerator.Co.Processor.Cores -0.009097247  0.044698306
## Rmax                           -0.045624168  0.002018992
## Rpeak                          -0.038063480  0.001846284
## Nmax                            0.257725010  0.195588781
## Nhalf                           0.206962641  0.877327355
## Power                           0.019742041 -0.050592776
## Mflops.Watt                    -0.215704630  0.021115047
## Processor.Speed..MHz.           0.699188167 -0.151854609
## Cores.per.Socket               -0.262029578  0.159021842
## Eff                             0.069634172  0.122745437
## 
## $cor
##                                     Dim.1       Dim.2       Dim.3
## Rank                           -0.5947605  0.51585716 -0.29570230
## Previous.Rank                  -0.5841319  0.46071795 -0.28817963
## First.Appearance               -0.3186119  0.61837832  0.54298393
## First.Rank                     -0.5606208  0.68297240  0.05489210
## Year                           -0.3184208  0.62075695  0.54214733
## Total.Cores                     0.8696188  0.42389651 -0.04809639
## Accelerator.Co.Processor.Cores  0.7264084  0.49220188 -0.12196963
## Rmax                            0.8918226  0.39582170 -0.01637216
## Rpeak                           0.8723756  0.44995513 -0.04556264
## Nmax                            0.6201017 -0.24627256 -0.20372160
## Nhalf                           0.0917428 -0.23884429 -0.17224974
## Power                           0.8337275  0.35274263 -0.20967665
## Mflops.Watt                     0.3119978 -0.16360149  0.66236746
## Processor.Speed..MHz.          -0.1937613  0.08094431 -0.40989259
## Cores.per.Socket                0.3027854 -0.14854002  0.66160564
## Eff                             0.2514955 -0.50083557  0.18892525
##                                       Dim.4        Dim.5
## Rank                           -0.398515784  0.174142734
## Previous.Rank                  -0.376589967  0.155087241
## First.Appearance                0.411871593  0.074663621
## First.Rank                     -0.069088912  0.155662668
## Year                            0.419965362  0.053582387
## Total.Cores                    -0.093073279  0.013544715
## Accelerator.Co.Processor.Cores -0.009097247  0.044698306
## Rmax                           -0.045624168  0.002018992
## Rpeak                          -0.038063480  0.001846284
## Nmax                            0.257725010  0.195588781
## Nhalf                           0.206962641  0.877327355
## Power                           0.019742041 -0.050592776
## Mflops.Watt                    -0.215704630  0.021115047
## Processor.Speed..MHz.           0.699188167 -0.151854609
## Cores.per.Socket               -0.262029578  0.159021842
## Eff                             0.069634172  0.122745437
## 
## $cos2
##                                      Dim.1       Dim.2        Dim.3
## Rank                           0.353739998 0.266108607 0.0874398515
## Previous.Rank                  0.341210067 0.212261033 0.0830474976
## First.Appearance               0.101513549 0.382391752 0.2948315525
## First.Rank                     0.314295701 0.466451305 0.0030131425
## Year                           0.101391809 0.385339192 0.2939237318
## Total.Cores                    0.756236812 0.179688253 0.0023132624
## Accelerator.Co.Processor.Cores 0.527669161 0.242262695 0.0148765906
## Rmax                           0.795347521 0.156674815 0.0002680476
## Rpeak                          0.761039130 0.202459615 0.0020759544
## Nmax                           0.384526058 0.060650173 0.0415024908
## Nhalf                          0.008416741 0.057046595 0.0296699745
## Power                          0.695101486 0.124427360 0.0439642995
## Mflops.Watt                    0.097342640 0.026765447 0.4387306538
## Processor.Speed..MHz.          0.037543449 0.006551982 0.1680119372
## Cores.per.Socket               0.091678975 0.022064138 0.4377220180
## Eff                            0.063249990 0.250836265 0.0356927489
##                                       Dim.4        Dim.5
## Rank                           0.1588148305 3.032569e-02
## Previous.Rank                  0.1418200035 2.405205e-02
## First.Appearance               0.1696382088 5.574656e-03
## First.Rank                     0.0047732778 2.423087e-02
## Year                           0.1763709057 2.871072e-03
## Total.Cores                    0.0086626353 1.834593e-04
## Accelerator.Co.Processor.Cores 0.0000827599 1.997939e-03
## Rmax                           0.0020815647 4.076327e-06
## Rpeak                          0.0014488285 3.408765e-06
## Nmax                           0.0664221807 3.825497e-02
## Nhalf                          0.0428335348 7.697033e-01
## Power                          0.0003897482 2.559629e-03
## Mflops.Watt                    0.0465284876 4.458452e-04
## Processor.Speed..MHz.          0.4888640934 2.305982e-02
## Cores.per.Socket               0.0686594998 2.528795e-02
## Eff                            0.0048489179 1.506644e-02
## 
## $contrib
##                                     Dim.1      Dim.2       Dim.3
## Rank                            6.5141852  8.7478772  4.42266805
## Previous.Rank                   6.2834442  6.9777279  4.20050478
## First.Appearance                1.8693901 12.5704919 14.91244627
## First.Rank                      5.7878114 15.3338097  0.15240338
## Year                            1.8671482 12.6673841 14.86652911
## Total.Cores                    13.9262358  5.9069520  0.11700376
## Accelerator.Co.Processor.Cores  9.7171217  7.9639825  0.75245121
## Rmax                           14.6464665  5.1504236  0.01355772
## Rpeak                          14.0146713  6.6555226  0.10500083
## Nmax                            7.0811159  1.9937734  2.09917717
## Nhalf                           0.1549958  1.8753118  1.50069386
## Power                          12.8004179  4.0903422  2.22369434
## Mflops.Watt                     1.7925821  0.8798695 22.19079758
## Processor.Speed..MHz.           0.6913693  0.2153855  8.49796762
## Cores.per.Socket                1.6882847  0.7253218 22.13978124
## Eff                             1.1647599  8.2458244  1.80532306
##                                       Dim.4        Dim.5
## Rank                           11.489675493 3.147055e+00
## Previous.Rank                  10.260161559 2.496007e+00
## First.Appearance               12.272707559 5.785112e-01
## First.Rank                      0.345329292 2.514564e+00
## Year                           12.759793698 2.979462e-01
## Total.Cores                     0.626710161 1.903853e-02
## Accelerator.Co.Processor.Cores  0.005987378 2.073365e-01
## Rmax                            0.150593638 4.230218e-04
## Rpeak                           0.104817472 3.537453e-04
## Nmax                            4.805403249 3.969918e+00
## Nhalf                           3.098850491 7.987613e+01
## Power                           0.028196862 2.656261e-01
## Mflops.Watt                     3.366166888 4.626769e-02
## Processor.Speed..MHz.          35.367539529 2.393038e+00
## Cores.per.Socket                4.967265151 2.624262e+00
## Eff                             0.350801580 1.563523e+00

barplot(pca.out$eig[,1],main="Eigenvalues",names.arg=1:nrow(pca.out$eig))

4.3.1 Note

Rpeak, Accelerator.Co.Processor.Cores, Total.Cores, Rmax, Power are very correlated and constitute one component. While Nmax and Processor.Speed..MHz are not correlated.

4.4 GLM

Now we are ready for GLM. Here we are.

glm.out=glm(Rpeak~Total.Cores+Accelerator.Co.Processor.Cores+Nmax+Mflops.Watt+Processor.Speed..MHz.+Segment+Processor.Technology+OS.Family+Architecture,data = top500.1)
summary(glm.out)

## 
## Call:
## glm(formula = Rpeak ~ Total.Cores + Accelerator.Co.Processor.Cores + 
##     Nmax + Mflops.Watt + Processor.Speed..MHz. + Segment + Processor.Technology + 
##     OS.Family + Architecture, data = top500.1)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2332130   -293526     27952    189998  15191053  
## 
## Coefficients:
##                                         Estimate Std. Error t value
## (Intercept)                           -1.082e+06  1.266e+06  -0.854
## Total.Cores                            1.438e+01  6.801e-01  21.142
## Accelerator.Co.Processor.Cores         4.355e+00  8.922e-01   4.881
## Nmax                                  -5.798e-02  6.742e-02  -0.860
## Mflops.Watt                            6.285e+02  1.373e+02   4.577
## Processor.Speed..MHz.                  6.795e+02  5.524e+02   1.230
## SegmentGovernment                     -6.632e+04  4.068e+05  -0.163
## SegmentIndustry                        2.343e+05  2.351e+05   0.997
## SegmentResearch                        1.359e+05  2.151e+05   0.632
## SegmentVendor                          2.394e+05  4.880e+05   0.491
## Processor.TechnologyIntel Core        -1.841e+06  1.304e+06  -1.412
## Processor.TechnologyIntel IvyBridge   -1.501e+06  5.553e+05  -2.704
## Processor.TechnologyIntel Nehalem     -1.039e+06  5.407e+05  -1.922
## Processor.TechnologyIntel SandyBridge -1.072e+06  5.009e+05  -2.140
## Processor.TechnologyOthers            -8.897e+05  1.398e+06  -0.636
## Processor.TechnologyPower             -1.689e+06  1.606e+06  -1.052
## Processor.TechnologyPowerPC           -2.580e+06  6.062e+05  -4.256
## Processor.TechnologySparc              2.152e+05  1.026e+06   0.210
## OS.FamilyMixed                         4.116e+05  2.427e+06   0.170
## OS.FamilyUnix                         -4.669e+05  1.309e+06  -0.357
## ArchitectureMPP                        8.242e+05  3.427e+05   2.405
##                                       Pr(>|t|)    
## (Intercept)                            0.39383    
## Total.Cores                            < 2e-16 ***
## Accelerator.Co.Processor.Cores        1.90e-06 ***
## Nmax                                   0.39058    
## Mflops.Watt                           7.49e-06 ***
## Processor.Speed..MHz.                  0.21979    
## SegmentGovernment                      0.87063    
## SegmentIndustry                        0.31986    
## SegmentResearch                        0.52796    
## SegmentVendor                          0.62413    
## Processor.TechnologyIntel Core         0.15916    
## Processor.TechnologyIntel IvyBridge    0.00732 ** 
## Processor.TechnologyIntel Nehalem      0.05576 .  
## Processor.TechnologyIntel SandyBridge  0.03334 *  
## Processor.TechnologyOthers             0.52521    
## Processor.TechnologyPower              0.29385    
## Processor.TechnologyPowerPC           2.96e-05 ***
## Processor.TechnologySparc              0.83400    
## OS.FamilyMixed                         0.86550    
## OS.FamilyUnix                          0.72162    
## ArchitectureMPP                        0.01693 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.36029e+12)
## 
##     Null deviance: 4.4393e+15  on 267  degrees of freedom
## Residual deviance: 3.3599e+14  on 247  degrees of freedom
##   (232 observations deleted due to missingness)
## AIC: 8270.3
## 
## Number of Fisher Scoring iterations: 2

ggplot(top500.1,aes(x=Processor.Technology,y=Eff,col=Segment))+geom_boxplot()+geom_jitter() + theme(axis.text.x = element_text(size=10, angle=90))

4.4.1 Note

Intel IvyBridge and PowerPC as Processor.Technology are the leaders in Rpeak acceleration especially for Academic and Research Segment.

4.5 LM

But Total.Cores is a must for all! Let’s check it.

4.5.1 Correlation Test

cor.test(Rpeak,Total.Cores)

## 
##  Pearson's product-moment correlation
## 
## data:  Rpeak and Total.Cores
## t = 65.9089, df = 498, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9373438 0.9555071
## sample estimates:
##       cor 
## 0.9471798

4.5.2 Linear model

fit.lm=lm(log(Rpeak)~log(Total.Cores),data = top500.1)
summary(fit.lm)

## 
## Call:
## lm(formula = log(Rpeak) ~ log(Total.Cores), data = top500.1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.54133 -0.12651 -0.01417  0.11844  1.58578 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.82397    0.19066   25.30   <2e-16 ***
## log(Total.Cores)  0.81436    0.01905   42.75   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.369 on 498 degrees of freedom
## Multiple R-squared:  0.7858, Adjusted R-squared:  0.7854 
## F-statistic:  1827 on 1 and 498 DF,  p-value: < 2.2e-16

4.5.3 On the whole by Year

ggplot(top500.1,aes(x=log(Total.Cores),y=log(Rpeak),col=Year))+geom_smooth(method="lm") + geom_point(aes(size=Eff))

fit.lm.year=lm(log(Rpeak)~log(Total.Cores)+Year,data = top500.1)
summary(fit.lm.year)

## 
## Call:
## lm(formula = log(Rpeak) ~ log(Total.Cores) + Year, data = top500.1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.18183 -0.17802 -0.04129  0.07572  1.55894 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -215.72915   27.84930  -7.746 5.36e-14 ***
## log(Total.Cores)    0.84975    0.01852  45.891  < 2e-16 ***
## Year                0.10941    0.01381   7.920 1.57e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3481 on 497 degrees of freedom
## Multiple R-squared:  0.8098, Adjusted R-squared:  0.8091 
## F-statistic:  1058 on 2 and 497 DF,  p-value: < 2.2e-16

anova(fit.lm,fit.lm.year)

## Analysis of Variance Table
## 
## Model 1: log(Rpeak) ~ log(Total.Cores)
## Model 2: log(Rpeak) ~ log(Total.Cores) + Year
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    498 67.808                                  
## 2    497 60.210  1    7.5985 62.721 1.569e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC(fit.lm,fit.lm.year)

##             df      AIC
## fit.lm       3 425.9753
## fit.lm.year  4 368.5507

4.5.3.1 Note

Well, as far as above mentioned correlation is concerned the Year positevely matters too: compare two models - the second one fit.lm.year is significantly better. AIC confirms this too.

4.5.4 On the whole by Segment

ggplot(top500.1,aes(x=log(Total.Cores),y=log(Rpeak),col=Segment))+geom_smooth(method="lm") +geom_jitter()

4.5.4.1 Note

The Research segment proves to be the leader in correlation on the whole.

4.5.5 Continent by Segment

ggplot(top500.1,aes(log(Total.Cores),log(Rpeak),col=Segment)) +geom_point(aes(size=Eff),shape=22) + facet_grid(Segment~Continent) + theme(axis.text.x = element_text(size=12, angle=90)) + geom_smooth(method="lm")

4.5.5.1 Note

Americas and Asia continents are very close in correlation as far as Research segment is concerned while Europe tries not to be outsider as Oceania. But in Academic segment Europe seems to be the leader. In the Government segment (DOD, DOE, NSA) Americas is the true leader.

4.5.6 Country by Segment

ggplot(newdata2,aes(log(Total.Cores),log(Rpeak),col=Segment)) +geom_point(aes(size=Eff),shape=22) + facet_grid(Segment~Country) + theme(axis.text.x = element_text(size=12, angle=90))+ geom_smooth(method="lm")

## Warning in qt((1 - level)/2, df): qngd`m{ NaN

4.5.6.1 Note

Unites States are the leader in correlation as far as all segments are concerned. Meanwhile France has got a real chance to be the leader in Industry and Research segments. Vive la France! China desperately tries to catch the leader in Research segment.

4.5.7 Manufacturer by Segment

ggplot(newdata,aes(log(Total.Cores),log(Rpeak),col=Segment)) +geom_point(aes(size=Eff),shape=22) + facet_grid(Segment~Manufacturer) + theme(axis.text.x = element_text(size=12, angle=90)) + geom_smooth(method="lm")

## Warning in qt((1 - level)/2, df): qngd`m{ NaN

## Warning in qt((1 - level)/2, df): qngd`m{ NaN

## Warning in qt((1 - level)/2, df): qngd`m{ NaN

4.5.7.1 Note

Dell and IBM are the leaders in Academic, while Cray Inc. is the leader in Government segment though Cray Inc., IBM and SGI are the leaders in Research segment while SGI has got a real chance to be the leader in Industry among HP and IBM as far as correlation is concerned.

4.6 Factors

And what about other combinations of factors?

ggplot(top500.1,aes(x=Interconnect.Family,y=Eff,col=Segment))+geom_boxplot()+geom_jitter() + theme(axis.text.x = element_text(size=10, angle=90))

4.6.1 Note

They do matter too: Interconnect Family.

4.7 ARL

Can we find relationships between Segment, Manufacturer, Architecture, Processor.Technology and OS.Family which indicate some kind of a system in installation of supercomputers? This is called association rule learning - data mining technique. For more information about ARL - http://en.wikipedia.org/wiki/Association_rule_learning. Let’s try ARL!

First, we should reduce our data set, using previous results.

new.dat.2=select(newdata,Segment,Manufacturer,Architecture,Processor.Technology,OS.Family)
head(new.dat.2)

##    Segment Manufacturer Architecture Processor.Technology OS.Family
## 2 Research    Cray Inc.          MPP           AMD x86_64     Linux
## 3 Research          IBM          MPP              PowerPC     Linux
## 5 Research          IBM          MPP              PowerPC     Linux
## 6 Research    Cray Inc.          MPP    Intel SandyBridge     Linux
## 7 Academic         Dell      Cluster    Intel SandyBridge     Linux
## 8 Research          IBM          MPP              PowerPC     Linux

Now we are ready for generating Association Rules with reduced data.

rules.all=apriori(new.dat.2)

## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport support minlen maxlen
##         0.8    0.1    1 none FALSE            TRUE     0.1      1     10
##  target   ext
##   rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## apriori - find association rules with the apriori algorithm
## version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[26 item(s), 452 transaction(s)] done [0.00s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [69 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

inspect(rules.all)

##    lhs                                         rhs                      support confidence      lift
## 1  {}                                       => {Architecture=Cluster} 0.8451327  0.8451327 1.0000000
## 2  {}                                       => {OS.Family=Linux}      0.9734513  0.9734513 1.0000000
## 3  {Manufacturer=Cray Inc.}                 => {OS.Family=Linux}      0.1128319  1.0000000 1.0272727
## 4  {Segment=Academic}                       => {OS.Family=Linux}      0.1371681  0.9841270 1.0109668
## 5  {Architecture=MPP}                       => {OS.Family=Linux}      0.1371681  0.8857143 0.9098701
## 6  {Segment=Research}                       => {OS.Family=Linux}      0.1769912  0.8988764 0.9233912
## 7  {Processor.Technology=Intel IvyBridge}   => {Architecture=Cluster} 0.1814159  0.8913043 1.0546324
## 8  {Processor.Technology=Intel IvyBridge}   => {OS.Family=Linux}      0.2035398  1.0000000 1.0272727
## 9  {Manufacturer=IBM}                       => {Architecture=Cluster} 0.3163717  0.8125000 0.9613874
## 10 {Manufacturer=IBM}                       => {OS.Family=Linux}      0.3650442  0.9375000 0.9630682
## 11 {Manufacturer=Hewlett-Packard}           => {Segment=Industry}     0.3606195  0.9005525 1.4965063
## 12 {Manufacturer=Hewlett-Packard}           => {Architecture=Cluster} 0.4004425  1.0000000 1.1832461
## 13 {Manufacturer=Hewlett-Packard}           => {OS.Family=Linux}      0.3982301  0.9944751 1.0215972
## 14 {Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.5774336  0.9849057 1.1653858
## 15 {Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.5840708  0.9962264 1.0233962
## 16 {Segment=Industry}                       => {Architecture=Cluster} 0.5973451  0.9926471 1.1745457
## 17 {Segment=Industry}                       => {OS.Family=Linux}      0.5995575  0.9963235 1.0234960
## 18 {Architecture=Cluster}                   => {OS.Family=Linux}      0.8362832  0.9895288 1.0165159
## 19 {OS.Family=Linux}                        => {Architecture=Cluster} 0.8362832  0.8590909 1.0165159
## 20 {Segment=Research,                                                                               
##     Architecture=Cluster}                   => {OS.Family=Linux}      0.1061947  0.9411765 0.9668449
## 21 {Segment=Industry,                                                                               
##     Processor.Technology=Intel IvyBridge}   => {Architecture=Cluster} 0.1150442  1.0000000 1.1832461
## 22 {Segment=Industry,                                                                               
##     Processor.Technology=Intel IvyBridge}   => {OS.Family=Linux}      0.1150442  1.0000000 1.0272727
## 23 {Architecture=Cluster,                                                                           
##     Processor.Technology=Intel IvyBridge}   => {OS.Family=Linux}      0.1814159  1.0000000 1.0272727
## 24 {Processor.Technology=Intel IvyBridge,                                                           
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.1814159  0.8913043 1.0546324
## 25 {Manufacturer=IBM,                                                                               
##     Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.2079646  1.0000000 1.1832461
## 26 {Manufacturer=IBM,                                                                               
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.2079646  1.0000000 1.0272727
## 27 {Segment=Industry,                                                                               
##     Manufacturer=IBM}                       => {Architecture=Cluster} 0.2256637  0.9902913 1.1717582
## 28 {Segment=Industry,                                                                               
##     Manufacturer=IBM}                       => {OS.Family=Linux}      0.2278761  1.0000000 1.0272727
## 29 {Manufacturer=IBM,                                                                               
##     Architecture=Cluster}                   => {OS.Family=Linux}      0.3097345  0.9790210 1.0057216
## 30 {Manufacturer=IBM,                                                                               
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.3097345  0.8484848 1.0039664
## 31 {Manufacturer=Hewlett-Packard,                                                                   
##     Processor.Technology=Intel SandyBridge} => {Segment=Industry}     0.2787611  0.9130435 1.5172634
## 32 {Manufacturer=Hewlett-Packard,                                                                   
##     Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.3053097  1.0000000 1.1832461
## 33 {Manufacturer=Hewlett-Packard,                                                                   
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.3030973  0.9927536 1.0198287
## 34 {Segment=Industry,                                                                               
##     Manufacturer=Hewlett-Packard}           => {Architecture=Cluster} 0.3606195  1.0000000 1.1832461
## 35 {Manufacturer=Hewlett-Packard,                                                                   
##     Architecture=Cluster}                   => {Segment=Industry}     0.3606195  0.9005525 1.4965063
## 36 {Segment=Industry,                                                                               
##     Manufacturer=Hewlett-Packard}           => {OS.Family=Linux}      0.3584071  0.9938650 1.0209704
## 37 {Manufacturer=Hewlett-Packard,                                                                   
##     OS.Family=Linux}                        => {Segment=Industry}     0.3584071  0.9000000 1.4955882
## 38 {Manufacturer=Hewlett-Packard,                                                                   
##     Architecture=Cluster}                   => {OS.Family=Linux}      0.3982301  0.9944751 1.0215972
## 39 {Manufacturer=Hewlett-Packard,                                                                   
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.3982301  1.0000000 1.1832461
## 40 {Segment=Industry,                                                                               
##     Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.4336283  1.0000000 1.1832461
## 41 {Segment=Industry,                                                                               
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.4314159  0.9948980 1.0220315
## 42 {Architecture=Cluster,                                                                           
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.5752212  0.9961686 1.0233368
## 43 {Processor.Technology=Intel SandyBridge,                                                         
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.5752212  0.9848485 1.1653181
## 44 {Segment=Industry,                                                                               
##     Architecture=Cluster}                   => {OS.Family=Linux}      0.5951327  0.9962963 1.0234680
## 45 {Segment=Industry,                                                                               
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.5951327  0.9926199 1.1745136
## 46 {Segment=Industry,                                                                               
##     Architecture=Cluster,                                                                           
##     Processor.Technology=Intel IvyBridge}   => {OS.Family=Linux}      0.1150442  1.0000000 1.0272727
## 47 {Segment=Industry,                                                                               
##     Processor.Technology=Intel IvyBridge,                                                           
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.1150442  1.0000000 1.1832461
## 48 {Segment=Industry,                                                                               
##     Manufacturer=IBM,                                                                               
##     Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.1504425  1.0000000 1.1832461
## 49 {Segment=Industry,                                                                               
##     Manufacturer=IBM,                                                                               
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.1504425  1.0000000 1.0272727
## 50 {Manufacturer=IBM,                                                                               
##     Architecture=Cluster,                                                                           
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.2079646  1.0000000 1.0272727
## 51 {Manufacturer=IBM,                                                                               
##     Processor.Technology=Intel SandyBridge,                                                         
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.2079646  1.0000000 1.1832461
## 52 {Segment=Industry,                                                                               
##     Manufacturer=IBM,                                                                               
##     Architecture=Cluster}                   => {OS.Family=Linux}      0.2256637  1.0000000 1.0272727
## 53 {Segment=Industry,                                                                               
##     Manufacturer=IBM,                                                                               
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.2256637  0.9902913 1.1717582
## 54 {Segment=Industry,                                                                               
##     Manufacturer=Hewlett-Packard,                                                                   
##     Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.2787611  1.0000000 1.1832461
## 55 {Manufacturer=Hewlett-Packard,                                                                   
##     Architecture=Cluster,                                                                           
##     Processor.Technology=Intel SandyBridge} => {Segment=Industry}     0.2787611  0.9130435 1.5172634
## 56 {Segment=Industry,                                                                               
##     Manufacturer=Hewlett-Packard,                                                                   
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.2765487  0.9920635 1.0191198
## 57 {Manufacturer=Hewlett-Packard,                                                                   
##     Processor.Technology=Intel SandyBridge,                                                         
##     OS.Family=Linux}                        => {Segment=Industry}     0.2765487  0.9124088 1.5162087
## 58 {Manufacturer=Hewlett-Packard,                                                                   
##     Architecture=Cluster,                                                                           
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.3030973  0.9927536 1.0198287
## 59 {Manufacturer=Hewlett-Packard,                                                                   
##     Processor.Technology=Intel SandyBridge,                                                         
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.3030973  1.0000000 1.1832461
## 60 {Segment=Industry,                                                                               
##     Manufacturer=Hewlett-Packard,                                                                   
##     Architecture=Cluster}                   => {OS.Family=Linux}      0.3584071  0.9938650 1.0209704
## 61 {Segment=Industry,                                                                               
##     Manufacturer=Hewlett-Packard,                                                                   
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.3584071  1.0000000 1.1832461
## 62 {Manufacturer=Hewlett-Packard,                                                                   
##     Architecture=Cluster,                                                                           
##     OS.Family=Linux}                        => {Segment=Industry}     0.3584071  0.9000000 1.4955882
## 63 {Segment=Industry,                                                                               
##     Architecture=Cluster,                                                                           
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.4314159  0.9948980 1.0220315
## 64 {Segment=Industry,                                                                               
##     Processor.Technology=Intel SandyBridge,                                                         
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.4314159  1.0000000 1.1832461
## 65 {Segment=Industry,                                                                               
##     Manufacturer=IBM,                                                                               
##     Architecture=Cluster,                                                                           
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.1504425  1.0000000 1.0272727
## 66 {Segment=Industry,                                                                               
##     Manufacturer=IBM,                                                                               
##     Processor.Technology=Intel SandyBridge,                                                         
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.1504425  1.0000000 1.1832461
## 67 {Segment=Industry,                                                                               
##     Manufacturer=Hewlett-Packard,                                                                   
##     Architecture=Cluster,                                                                           
##     Processor.Technology=Intel SandyBridge} => {OS.Family=Linux}      0.2765487  0.9920635 1.0191198
## 68 {Segment=Industry,                                                                               
##     Manufacturer=Hewlett-Packard,                                                                   
##     Processor.Technology=Intel SandyBridge,                                                         
##     OS.Family=Linux}                        => {Architecture=Cluster} 0.2765487  1.0000000 1.1832461
## 69 {Manufacturer=Hewlett-Packard,                                                                   
##     Architecture=Cluster,                                                                           
##     Processor.Technology=Intel SandyBridge,                                                         
##     OS.Family=Linux}                        => {Segment=Industry}     0.2765487  0.9124088 1.5162087

plot(rules.all)

plot(rules.all,method="grouped")

plot(rules.all,method="graph")

plot(rules.all,method="graph",control=list(type="items"))

plot(rules.all,method="paracoord")

4.7.1 Note

We can see very interesting graphical view of ARL in our case: Academic and Research look very individual in contrast to Industry where Linux, Cluster, IBM and Intel SandBridge accumulate main trend.

5 Conclusions

1.Industry shows very low effectiveness of supercomputers while Vendor and Research are true leaders.

2.Bull SA, Cray Inc. and SGI are the leaders among manufacturers of supercomputers.

3.MPP leads in effectiveness in comparison with Cluster architecture of supercomputers.

4.Americas and Europe are the leading continents in effectiveness of supercomputers.

5.Western Europe leads in Academic segment while Northern Europe does in Research by effectiveness of supercomputers.

6.United States, Germany, United Kingdom, France and Japan are the leaders in effectiveness of supercomputers while China and Russia miss this opportunity though Russia is closer to the leaders.

7.China as outsider has the most big values for effectiveness in Academic and Research while United States as a leader has the most big values for effectiveness in Academic, Government, Research and Vendor segments.

8.Rpeak, Accelerator.Co.Processor.Cores, Total.Cores, Rmax, Power are very correlated and constitute one component while Nmax and Processor.Speed..MHz are not correlated.

9.Intel IvyBridge and PowerPC as Processor.Technology are the leaders in Rpeak acceleration especially for Academic and Research Segment.

10.On the whole Segment, Country, Processor.Technology and OS.Family combined with Architecture pay their tribute in difference of effectiveness of supercomputers.

So we have made a data mining tour with R on a real data set and got some interesting and non-obvious results.This race would be nonstop forever.

Supercomputers: data mining tour with R

Alexander Levakov

April, 2015