The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic R.A.Fisher
Supercomputers have made a great leap in the race for speed for the last 20 years: http://en.wikipedia.org/wiki/Supercomputer. But is there any interesting thing about these monsters except the speed of calculations? Let’s see!
First we get file with top500 data - http://www.top500.org/lists/2014/06/.
Then put file into working directory, read file and view file structure.
if (Sys.getenv("JAVA_HOME")!="")
+ Sys.setenv(JAVA_HOME="")
## [1] 1
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
library(psych)
library(ggplot2)
##
## Attaching package: 'ggplot2'
##
## The following object is masked from 'package:psych':
##
## %+%
library(rpart)
library(party)
## Loading required package: grid
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: sandwich
## Loading required package: strucchange
## Loading required package: modeltools
## Loading required package: stats4
##
## Attaching package: 'modeltools'
##
## The following object is masked from 'package:rJava':
##
## clone
library(randomForest)
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
##
## The following object is masked from 'package:psych':
##
## outlier
library(MASS)
library(FactoMineR)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.3
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:MASS':
##
## select
##
## The following object is masked from 'package:randomForest':
##
## combine
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(arules)
## Warning: package 'arules' was built under R version 3.1.3
## Loading required package: Matrix
##
## Attaching package: 'arules'
##
## The following object is masked from 'package:modeltools':
##
## info
##
## The following objects are masked from 'package:base':
##
## %in%, write
library(arulesViz)
## Warning: package 'arulesViz' was built under R version 3.1.3
##
## Attaching package: 'arulesViz'
##
## The following object is masked from 'package:base':
##
## abbreviate
top500<-read.xlsx("./top500/TOP500_201406.xls",1)
attach(top500)
str(top500)
## 'data.frame': 500 obs. of 34 variables:
## $ Rank : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Previous.Rank : num 1 2 3 4 5 6 7 8 9 NA ...
## $ First.Appearance : num 41 40 38 37 39 40 40 39 39 43 ...
## $ First.Rank : num 1 1 17 1 3 114 7 8 48 10 ...
## $ Name : Factor w/ 175 levels "Abel","Ada","airain",..: 157 159 136 NA 105 122 145 88 167 NA ...
## $ Computer : Factor w/ 258 levels "Acer AR585 F1 Cluster, Opteron 12C 2.2GHz, QDR infiniband",..: 231 138 11 190 13 124 205 12 12 121 ...
## $ Site : Factor w/ 220 levels "Academic Center for Computing and Media Studies (ACCMS), Kyoto University",..: 151 54 51 170 52 188 194 77 51 82 ...
## $ Manufacturer : Factor w/ 32 levels "Acer Group","Adtech",..: 26 7 16 11 16 7 9 16 16 7 ...
## $ Country : Factor w/ 29 levels "Australia","Austria",..: 6 29 29 16 29 26 29 10 29 29 ...
## $ Year : num 2013 2012 2011 2011 2012 ...
## $ Segment : Factor w/ 6 levels "Academic","Government",..: 5 5 5 5 5 5 1 5 5 2 ...
## $ Total.Cores : num 3120000 560640 1572864 705024 786432 ...
## $ Accelerator.Co.Processor.Cores: num 2736000 261632 0 0 0 ...
## $ Rmax : num 33862700 17590000 17173224 10510000 8586612 ...
## $ Rpeak : num 54902400 27112550 20132659 11280384 10066330 ...
## $ Nmax : num 9960000 0 0 11870208 0 ...
## $ Nhalf : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Power : num 17808 8209 7890 12660 3945 ...
## $ Mflops.Watt : num 1902 2143 2177 830 2177 ...
## $ Architecture : Factor w/ 2 levels "Cluster","MPP": 1 2 2 1 2 2 1 2 2 2 ...
## $ Processor : Factor w/ 65 levels "Intel Xeon E5-2450v2 8C 2.5GHz",..: 12 23 27 33 27 46 47 27 27 14 ...
## $ Processor.Technology : Factor w/ 10 levels "AMD x86_64","Intel Core",..: 4 1 9 10 9 6 6 9 9 4 ...
## $ Processor.Speed..MHz. : num 2200 2200 1600 2000 1600 2600 2700 1600 1600 2700 ...
## $ Operating.System : Factor w/ 22 levels "AIX","Bullx Linux",..: 9 8 10 10 10 8 10 10 10 8 ...
## $ OS.Family : Factor w/ 4 levels "Linux","Mixed",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Accelerator.Co.Processor : Factor w/ 23 levels "AMD FirePro S10000",..: 5 22 14 14 14 22 11 14 14 14 ...
## $ Cores.per.Socket : num 12 16 16 8 16 8 8 16 16 12 ...
## $ Processor.Generation : Factor w/ 17 levels "Intel Xeon E3 (Haswell)",..: 2 6 8 13 8 3 3 8 8 2 ...
## $ System.Model : Factor w/ 113 levels "Acer AR585 F1 Cluster",..: 103 48 9 72 9 45 84 9 9 45 ...
## $ System.Family : Factor w/ 52 levels "Acer Group Cluster",..: 50 11 25 17 25 9 14 25 25 9 ...
## $ Interconnect.Family : Factor w/ 7 levels "10G","Cray Interconnect",..: 3 2 3 3 3 3 5 3 3 3 ...
## $ Interconnect : Factor w/ 19 levels "10G Ethernet",..: 17 3 4 4 4 2 10 4 4 2 ...
## $ Region : Factor w/ 11 levels "Australia and New Zealand",..: 2 4 4 2 4 11 4 11 4 4 ...
## $ Continent : Factor w/ 4 levels "Americas","Asia",..: 2 1 1 2 1 3 1 3 1 1 ...
head(top500,3)
## Rank Previous.Rank First.Appearance First.Rank Name
## 1 1 1 41 1 Tianhe-2 (MilkyWay-2)
## 2 2 2 40 1 Titan
## 3 3 3 38 17 Sequoia
## Computer
## 1 TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P
## 2 Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x
## 3 BlueGene/Q, Power BQC 16C 1.60 GHz, Custom
## Site Manufacturer Country
## 1 National Super Computer Center in Guangzhou NUDT China
## 2 DOE/SC/Oak Ridge National Laboratory Cray Inc. United States
## 3 DOE/NNSA/LLNL IBM United States
## Year Segment Total.Cores Accelerator.Co.Processor.Cores Rmax
## 1 2013 Research 3120000 2736000 33862700
## 2 2012 Research 560640 261632 17590000
## 3 2011 Research 1572864 0 17173224
## Rpeak Nmax Nhalf Power Mflops.Watt Architecture
## 1 54902400 9960000 0 17808 1901.54 Cluster
## 2 27112550 0 0 8209 2142.77 MPP
## 3 20132659 0 0 7890 2176.58 MPP
## Processor Processor.Technology
## 1 Intel Xeon E5-2692v2 12C 2.2GHz Intel IvyBridge
## 2 Opteron 6274 16C 2.2GHz AMD x86_64
## 3 Power BQC 16C 1.6GHz PowerPC
## Processor.Speed..MHz. Operating.System OS.Family
## 1 2200 Kylin Linux Linux
## 2 2200 Cray Linux Environment Linux
## 3 1600 Linux Linux
## Accelerator.Co.Processor Cores.per.Socket
## 1 Intel Xeon Phi 31S1P 12
## 2 NVIDIA K20x 16
## 3 None 16
## Processor.Generation System.Model System.Family
## 1 Intel Xeon E5 (IvyBridge) TH-IVB-FEP Cluster TH-IVB Cluster
## 2 Opteron 6200 Series "Interlagos" Cray XK7 Cray XK
## 3 Power BQC BlueGene/Q IBM BlueGene
## Interconnect.Family Interconnect Region Continent
## 1 Custom Interconnect TH Express-2 Eastern Asia Asia
## 2 Cray Interconnect Cray Gemini interconnect North America Americas
## 3 Custom Interconnect Custom Interconnect North America Americas
We need some new variable for our new data set top500.1: Effectiveness - \(Eff=Rmax/Rpeak\). View histogram for Eff - it has skew to the right tail (negative skew). Good news for effectiveness.
Eff<-c(Rmax/Rpeak)
top500.1<-cbind(top500,Eff)
par(mfrow=c(1,1))
boxplot(Eff,horizontal = T,col="grey")
hist(Eff,col="grey")
plot(density(Eff))
polygon(density(Eff),col = "grey")
grid()
summary(Eff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2516 0.5097 0.7160 0.6844 0.8530 0.9980
describe(Eff)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 500 0.68 0.21 0.72 0.71 0.23 0.25 1 0.75 -0.7 -0.65 0.01
Is there any difference in Eff by Segment , Manufacturer, Continent, Region and *Country ?
We use boxplot and densityplot to answer this question. Industry is low on the whole.
ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment,))+geom_jitter()+geom_boxplot()+
theme(axis.text.x = element_text(size=10, angle=90))
ggplot(top500.1, aes(Eff,fill=Segment)) + geom_density(alpha = 0.2)
newdata <- subset(top500.1, Manufacturer=="SGI" | Manufacturer=="IBM" | Manufacturer=="Cray Inc." | Manufacturer=="Hewlett-Packard" | Manufacturer=="Dell" | Manufacturer=="Bull SA")
ggplot(newdata,aes(x=Manufacturer,y=Eff,fill=Manufacturer,))+geom_jitter()+geom_boxplot()+ theme(axis.text.x = element_text(size=10, angle=90))
ggplot(newdata, aes(Eff,fill=Manufacturer)) + geom_density(alpha = 0.2)
ggplot(top500.1,aes(x=Architecture,y=Eff,fill=Architecture,))+geom_jitter()+geom_boxplot()+ theme(axis.text.x = element_text(size=10, angle=90))
ggplot(top500.1, aes(Eff,fill=Architecture)) + geom_density(alpha = 0.2)
ggplot(top500.1,aes(x=Continent,y=Eff,fill=Continent,))+geom_jitter()+geom_boxplot()+
theme(axis.text.x = element_text(size=10, angle=90))
ggplot(top500.1, aes(Eff,fill=Continent)) + geom_density(alpha = 0.2)
ggplot(top500.1,aes(x=Region,y=Eff,fill=Region,))+geom_jitter()+geom_boxplot()+
theme(axis.text.x = element_text(size=10, angle=90))
ggplot(top500.1, aes(Eff,fill=Region)) + geom_density(alpha = 0.2)
newdata2 <- subset(top500.1, Country=="United States" | Country=="Germany" | Country=="Japan" | Country=="United Kingdom" | Country=="China" | Country=="France" | Country=="Russia")
ggplot(newdata2,aes(x=Country,y=Eff,fill=Country,))+geom_jitter()+geom_boxplot()+ theme(axis.text.x = element_text(size=10, angle=90))
ggplot(newdata2, aes(Eff,fill=Country)) + geom_density(alpha = 0.2)
And what if we want to view the distribution of Eff by Segment as a density using a “violin plot”?
ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment,))+geom_jitter()+geom_violin()
ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment))+geom_jitter()+geom_violin() +facet_wrap(~Region)+
theme(axis.text.x = element_text(size=10, angle=90))
ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment))+geom_jitter()+geom_violin() +facet_wrap(~Architecture) + theme(axis.text.x = element_text(size=14, angle=90))
ggplot(top500.1,aes(x=Segment,y=Eff,fill=Segment))+geom_jitter()+geom_violin() +facet_wrap(~Processor.Technology) + theme(axis.text.x = element_text(size=10, angle=90))
We get very interesting result especially for Region, Architecture and Processor.Technology.
Is this difference statistically significant not only for Segment? Does ANOVA know the answer?
aov.out=aov(Eff~Country+Segment+Processor.Technology+OS.Family*Architecture,data = top500.1)
summary(aov.out)
## Df Sum Sq Mean Sq F value Pr(>F)
## Country 28 9.852 0.3519 19.186 < 2e-16 ***
## Segment 5 2.732 0.5464 29.794 < 2e-16 ***
## Processor.Technology 9 0.527 0.0586 3.193 0.000926 ***
## OS.Family 3 0.188 0.0627 3.418 0.017338 *
## Architecture 1 0.001 0.0006 0.034 0.853158
## OS.Family:Architecture 1 0.184 0.1843 10.052 0.001625 **
## Residuals 452 8.289 0.0183
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coef(aov.out)
## (Intercept)
## 0.7186661783
## CountryAustria
## 0.1176344086
## CountryBelgium
## 0.0696953899
## CountryBrazil
## -0.0731036643
## CountryCanada
## 0.0683843161
## CountryChina
## -0.3398848251
## CountryDenmark
## 0.2363785142
## CountryFinland
## 0.0562642443
## CountryFrance
## 0.0779536007
## CountryGermany
## 0.0567875033
## CountryHong Kong
## -0.1899429427
## CountryIndia
## 0.0337653908
## CountryIreland
## 0.0907879755
## CountryIsrael
## -0.1939172646
## CountryItaly
## 0.0119032757
## CountryJapan
## -0.0485267470
## CountryKorea, South
## -0.0603637416
## CountryMalaysia
## -0.2036983240
## CountryNetherlands
## 0.0030047763
## CountryNorway
## 0.0538293092
## CountryPoland
## 0.0067512792
## CountryRussia
## -0.1363252827
## CountrySaudi Arabia
## 0.0080038221
## CountrySpain
## 0.0431682418
## CountrySweden
## 0.0548338403
## CountrySwitzerland
## -0.0137604755
## CountryTaiwan
## -0.0001472943
## CountryUnited Kingdom
## -0.0028473396
## CountryUnited States
## -0.0396982916
## SegmentGovernment
## 0.0607195503
## SegmentIndustry
## -0.1098267980
## SegmentOthers
## -0.0390887545
## SegmentResearch
## 0.0453074070
## SegmentVendor
## 0.1262483003
## Processor.TechnologyIntel Core
## 0.0029070959
## Processor.TechnologyIntel Haswell
## -0.3138411273
## Processor.TechnologyIntel IvyBridge
## 0.0976988342
## Processor.TechnologyIntel Nehalem
## 0.0373042706
## Processor.TechnologyIntel SandyBridge
## 0.0722636536
## Processor.TechnologyOthers
## 0.3196318050
## Processor.TechnologyPower
## -0.0622529158
## Processor.TechnologyPowerPC
## 0.0906852318
## Processor.TechnologySparc
## 0.2374027685
## OS.FamilyMixed
## 0.1561183131
## OS.FamilyUnix
## 0.3625192139
## OS.FamilyWindows
## 0.3050431844
## ArchitectureMPP
## 0.0362007293
## OS.FamilyUnix:ArchitectureMPP
## -0.2821314759
par(mfrow=c(2,2))
plot(aov.out)
## Warning: not plotting observations with leverage one:
## 22, 46, 270, 303, 423, 476
## Warning: not plotting observations with leverage one:
## 22, 46, 270, 303, 423, 476
## Warning in sqrt(crit * p * (1 - hh)/hh): qngd`m{ NaN
## Warning in sqrt(crit * p * (1 - hh)/hh): qngd`m{ NaN
par(mfrow=c(1,1))
ANOVA shows good fit for residuals.
ANOVA says YES: not only Segment but Country, Processor.Technology and OS.Family combined with Architecture pay their tribute in this difference.
Can we see it in details? Let’s try classification tree.
tree.fit<-rpart(Eff~Country+Segment+Processor.Technology+OS.Family+Architecture,data = top500.1)
tree.fit
## n= 500
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 500 21.7735000 0.6843975
## 2) Country=China,Hong Kong,Israel,Malaysia 80 2.5276620 0.3908875
## 4) Segment=Industry 66 0.8228633 0.3301390 *
## 5) Segment=Academic,Research 14 0.3129967 0.6772734 *
## 3) Country=Australia,Austria,Belgium,Brazil,Canada,Denmark,Finland,France,Germany,India,Ireland,Italy,Japan,Korea, South,Netherlands,Norway,Poland,Russia,Saudi Arabia,Spain,Sweden,Switzerland,Taiwan,United Kingdom,United States 420 11.0412500 0.7403042
## 6) Segment=Industry,Others 212 6.1326970 0.6661437
## 12) Country=Brazil,Korea, South,Netherlands,Russia,Sweden,United Kingdom,United States 174 4.7582110 0.6347066
## 24) Processor.Technology=AMD x86_64,Intel Haswell,Intel Nehalem 17 0.1885261 0.4996429 *
## 25) Processor.Technology=Intel IvyBridge,Intel SandyBridge 157 4.2259880 0.6493313 *
## 13) Country=Belgium,Canada,Denmark,France,Germany,India,Italy,Japan,Saudi Arabia 38 0.4151134 0.8100927 *
## 7) Segment=Academic,Government,Research,Vendor 208 2.5542270 0.8158909 *
plotcp(tree.fit)
plot(tree.fit)
text(tree.fit,use.n=TRUE, all=TRUE, cex=.5)
tree.fit2 <- ctree(Eff~Segment+Processor.Technology+OS.Family,data = top500.1,subset = Country=="China")
tree.fit2
##
## Conditional inference tree with 3 terminal nodes
##
## Response: Eff
## Inputs: Segment, Processor.Technology, OS.Family
## Number of observations: 76
##
## 1) Segment == {Academic, Research}; criterion = 1, statistic = 46.061
## 2)* weights = 14
## 1) Segment == {Industry}
## 3) Processor.Technology == {Intel Nehalem, Intel SandyBridge}; criterion = 0.993, statistic = 12.052
## 4)* weights = 50
## 3) Processor.Technology == {Intel IvyBridge}
## 5)* weights = 12
plot(tree.fit2)
tree.fit3 <- ctree(Eff~Segment+Processor.Technology+OS.Family,data = top500.1,subset = Country=="United States")
tree.fit3
##
## Conditional inference tree with 4 terminal nodes
##
## Response: Eff
## Inputs: Segment, Processor.Technology, OS.Family
## Number of observations: 232
##
## 1) Segment == {Academic, Government, Research, Vendor}; criterion = 1, statistic = 55.151
## 2)* weights = 80
## 1) Segment == {Industry}
## 3) Processor.Technology == {Intel IvyBridge, Intel SandyBridge}; criterion = 0.999, statistic = 21.042
## 4) Processor.Technology == {Intel IvyBridge}; criterion = 0.967, statistic = 6.431
## 5)* weights = 18
## 4) Processor.Technology == {Intel SandyBridge}
## 6)* weights = 118
## 3) Processor.Technology == {AMD x86_64, Intel Haswell, Intel Nehalem}
## 7)* weights = 16
plot(tree.fit3)
China, Hong Kong, Israel, Malaysia are splitted from the root of tree. China as outsider has the most big values for \(Eff\) in Academic and Research while United States as a leader has the most big values for \(Eff\) in Academic, Government, Research and Vendor segments.
What numerical factors are correlated and what are not? PCA would help us.
nums <- sapply(top500.1, is.numeric)
top500.1.num<-top500.1[,nums]
par(mfrow=c(1,1))
pca.out=PCA(top500.1.num)
pca.out$eig
## eigenvalue percentage of variance
## comp 1 5.430303085 33.939394282
## comp 2 3.041979227 19.012370166
## comp 3 1.977083753 12.356773459
## comp 4 1.382239477 8.638996732
## comp 5 0.963621166 6.022632289
## comp 6 0.902337198 5.639607490
## comp 7 0.642724766 4.017029788
## comp 8 0.478067739 2.987923366
## comp 9 0.388568956 2.428555975
## comp 10 0.305297941 1.908112131
## comp 11 0.247979547 1.549872168
## comp 12 0.081532303 0.509576893
## comp 13 0.068369005 0.427306278
## comp 14 0.054474229 0.340463929
## comp 15 0.034206336 0.213789599
## comp 16 0.001215273 0.007595455
## cumulative percentage of variance
## comp 1 33.93939
## comp 2 52.95176
## comp 3 65.30854
## comp 4 73.94753
## comp 5 79.97017
## comp 6 85.60977
## comp 7 89.62680
## comp 8 92.61473
## comp 9 95.04328
## comp 10 96.95140
## comp 11 98.50127
## comp 12 99.01084
## comp 13 99.43815
## comp 14 99.77861
## comp 15 99.99240
## comp 16 100.00000
pca.out$var
## $coord
## Dim.1 Dim.2 Dim.3
## Rank -0.5947605 0.51585716 -0.29570230
## Previous.Rank -0.5841319 0.46071795 -0.28817963
## First.Appearance -0.3186119 0.61837832 0.54298393
## First.Rank -0.5606208 0.68297240 0.05489210
## Year -0.3184208 0.62075695 0.54214733
## Total.Cores 0.8696188 0.42389651 -0.04809639
## Accelerator.Co.Processor.Cores 0.7264084 0.49220188 -0.12196963
## Rmax 0.8918226 0.39582170 -0.01637216
## Rpeak 0.8723756 0.44995513 -0.04556264
## Nmax 0.6201017 -0.24627256 -0.20372160
## Nhalf 0.0917428 -0.23884429 -0.17224974
## Power 0.8337275 0.35274263 -0.20967665
## Mflops.Watt 0.3119978 -0.16360149 0.66236746
## Processor.Speed..MHz. -0.1937613 0.08094431 -0.40989259
## Cores.per.Socket 0.3027854 -0.14854002 0.66160564
## Eff 0.2514955 -0.50083557 0.18892525
## Dim.4 Dim.5
## Rank -0.398515784 0.174142734
## Previous.Rank -0.376589967 0.155087241
## First.Appearance 0.411871593 0.074663621
## First.Rank -0.069088912 0.155662668
## Year 0.419965362 0.053582387
## Total.Cores -0.093073279 0.013544715
## Accelerator.Co.Processor.Cores -0.009097247 0.044698306
## Rmax -0.045624168 0.002018992
## Rpeak -0.038063480 0.001846284
## Nmax 0.257725010 0.195588781
## Nhalf 0.206962641 0.877327355
## Power 0.019742041 -0.050592776
## Mflops.Watt -0.215704630 0.021115047
## Processor.Speed..MHz. 0.699188167 -0.151854609
## Cores.per.Socket -0.262029578 0.159021842
## Eff 0.069634172 0.122745437
##
## $cor
## Dim.1 Dim.2 Dim.3
## Rank -0.5947605 0.51585716 -0.29570230
## Previous.Rank -0.5841319 0.46071795 -0.28817963
## First.Appearance -0.3186119 0.61837832 0.54298393
## First.Rank -0.5606208 0.68297240 0.05489210
## Year -0.3184208 0.62075695 0.54214733
## Total.Cores 0.8696188 0.42389651 -0.04809639
## Accelerator.Co.Processor.Cores 0.7264084 0.49220188 -0.12196963
## Rmax 0.8918226 0.39582170 -0.01637216
## Rpeak 0.8723756 0.44995513 -0.04556264
## Nmax 0.6201017 -0.24627256 -0.20372160
## Nhalf 0.0917428 -0.23884429 -0.17224974
## Power 0.8337275 0.35274263 -0.20967665
## Mflops.Watt 0.3119978 -0.16360149 0.66236746
## Processor.Speed..MHz. -0.1937613 0.08094431 -0.40989259
## Cores.per.Socket 0.3027854 -0.14854002 0.66160564
## Eff 0.2514955 -0.50083557 0.18892525
## Dim.4 Dim.5
## Rank -0.398515784 0.174142734
## Previous.Rank -0.376589967 0.155087241
## First.Appearance 0.411871593 0.074663621
## First.Rank -0.069088912 0.155662668
## Year 0.419965362 0.053582387
## Total.Cores -0.093073279 0.013544715
## Accelerator.Co.Processor.Cores -0.009097247 0.044698306
## Rmax -0.045624168 0.002018992
## Rpeak -0.038063480 0.001846284
## Nmax 0.257725010 0.195588781
## Nhalf 0.206962641 0.877327355
## Power 0.019742041 -0.050592776
## Mflops.Watt -0.215704630 0.021115047
## Processor.Speed..MHz. 0.699188167 -0.151854609
## Cores.per.Socket -0.262029578 0.159021842
## Eff 0.069634172 0.122745437
##
## $cos2
## Dim.1 Dim.2 Dim.3
## Rank 0.353739998 0.266108607 0.0874398515
## Previous.Rank 0.341210067 0.212261033 0.0830474976
## First.Appearance 0.101513549 0.382391752 0.2948315525
## First.Rank 0.314295701 0.466451305 0.0030131425
## Year 0.101391809 0.385339192 0.2939237318
## Total.Cores 0.756236812 0.179688253 0.0023132624
## Accelerator.Co.Processor.Cores 0.527669161 0.242262695 0.0148765906
## Rmax 0.795347521 0.156674815 0.0002680476
## Rpeak 0.761039130 0.202459615 0.0020759544
## Nmax 0.384526058 0.060650173 0.0415024908
## Nhalf 0.008416741 0.057046595 0.0296699745
## Power 0.695101486 0.124427360 0.0439642995
## Mflops.Watt 0.097342640 0.026765447 0.4387306538
## Processor.Speed..MHz. 0.037543449 0.006551982 0.1680119372
## Cores.per.Socket 0.091678975 0.022064138 0.4377220180
## Eff 0.063249990 0.250836265 0.0356927489
## Dim.4 Dim.5
## Rank 0.1588148305 3.032569e-02
## Previous.Rank 0.1418200035 2.405205e-02
## First.Appearance 0.1696382088 5.574656e-03
## First.Rank 0.0047732778 2.423087e-02
## Year 0.1763709057 2.871072e-03
## Total.Cores 0.0086626353 1.834593e-04
## Accelerator.Co.Processor.Cores 0.0000827599 1.997939e-03
## Rmax 0.0020815647 4.076327e-06
## Rpeak 0.0014488285 3.408765e-06
## Nmax 0.0664221807 3.825497e-02
## Nhalf 0.0428335348 7.697033e-01
## Power 0.0003897482 2.559629e-03
## Mflops.Watt 0.0465284876 4.458452e-04
## Processor.Speed..MHz. 0.4888640934 2.305982e-02
## Cores.per.Socket 0.0686594998 2.528795e-02
## Eff 0.0048489179 1.506644e-02
##
## $contrib
## Dim.1 Dim.2 Dim.3
## Rank 6.5141852 8.7478772 4.42266805
## Previous.Rank 6.2834442 6.9777279 4.20050478
## First.Appearance 1.8693901 12.5704919 14.91244627
## First.Rank 5.7878114 15.3338097 0.15240338
## Year 1.8671482 12.6673841 14.86652911
## Total.Cores 13.9262358 5.9069520 0.11700376
## Accelerator.Co.Processor.Cores 9.7171217 7.9639825 0.75245121
## Rmax 14.6464665 5.1504236 0.01355772
## Rpeak 14.0146713 6.6555226 0.10500083
## Nmax 7.0811159 1.9937734 2.09917717
## Nhalf 0.1549958 1.8753118 1.50069386
## Power 12.8004179 4.0903422 2.22369434
## Mflops.Watt 1.7925821 0.8798695 22.19079758
## Processor.Speed..MHz. 0.6913693 0.2153855 8.49796762
## Cores.per.Socket 1.6882847 0.7253218 22.13978124
## Eff 1.1647599 8.2458244 1.80532306
## Dim.4 Dim.5
## Rank 11.489675493 3.147055e+00
## Previous.Rank 10.260161559 2.496007e+00
## First.Appearance 12.272707559 5.785112e-01
## First.Rank 0.345329292 2.514564e+00
## Year 12.759793698 2.979462e-01
## Total.Cores 0.626710161 1.903853e-02
## Accelerator.Co.Processor.Cores 0.005987378 2.073365e-01
## Rmax 0.150593638 4.230218e-04
## Rpeak 0.104817472 3.537453e-04
## Nmax 4.805403249 3.969918e+00
## Nhalf 3.098850491 7.987613e+01
## Power 0.028196862 2.656261e-01
## Mflops.Watt 3.366166888 4.626769e-02
## Processor.Speed..MHz. 35.367539529 2.393038e+00
## Cores.per.Socket 4.967265151 2.624262e+00
## Eff 0.350801580 1.563523e+00
barplot(pca.out$eig[,1],main="Eigenvalues",names.arg=1:nrow(pca.out$eig))
Rpeak, Accelerator.Co.Processor.Cores, Total.Cores, Rmax, Power are very correlated and constitute one component. While Nmax and Processor.Speed..MHz are not correlated.
Now we are ready for GLM. Here we are.
glm.out=glm(Rpeak~Total.Cores+Accelerator.Co.Processor.Cores+Nmax+Mflops.Watt+Processor.Speed..MHz.+Segment+Processor.Technology+OS.Family+Architecture,data = top500.1)
summary(glm.out)
##
## Call:
## glm(formula = Rpeak ~ Total.Cores + Accelerator.Co.Processor.Cores +
## Nmax + Mflops.Watt + Processor.Speed..MHz. + Segment + Processor.Technology +
## OS.Family + Architecture, data = top500.1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2332130 -293526 27952 189998 15191053
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -1.082e+06 1.266e+06 -0.854
## Total.Cores 1.438e+01 6.801e-01 21.142
## Accelerator.Co.Processor.Cores 4.355e+00 8.922e-01 4.881
## Nmax -5.798e-02 6.742e-02 -0.860
## Mflops.Watt 6.285e+02 1.373e+02 4.577
## Processor.Speed..MHz. 6.795e+02 5.524e+02 1.230
## SegmentGovernment -6.632e+04 4.068e+05 -0.163
## SegmentIndustry 2.343e+05 2.351e+05 0.997
## SegmentResearch 1.359e+05 2.151e+05 0.632
## SegmentVendor 2.394e+05 4.880e+05 0.491
## Processor.TechnologyIntel Core -1.841e+06 1.304e+06 -1.412
## Processor.TechnologyIntel IvyBridge -1.501e+06 5.553e+05 -2.704
## Processor.TechnologyIntel Nehalem -1.039e+06 5.407e+05 -1.922
## Processor.TechnologyIntel SandyBridge -1.072e+06 5.009e+05 -2.140
## Processor.TechnologyOthers -8.897e+05 1.398e+06 -0.636
## Processor.TechnologyPower -1.689e+06 1.606e+06 -1.052
## Processor.TechnologyPowerPC -2.580e+06 6.062e+05 -4.256
## Processor.TechnologySparc 2.152e+05 1.026e+06 0.210
## OS.FamilyMixed 4.116e+05 2.427e+06 0.170
## OS.FamilyUnix -4.669e+05 1.309e+06 -0.357
## ArchitectureMPP 8.242e+05 3.427e+05 2.405
## Pr(>|t|)
## (Intercept) 0.39383
## Total.Cores < 2e-16 ***
## Accelerator.Co.Processor.Cores 1.90e-06 ***
## Nmax 0.39058
## Mflops.Watt 7.49e-06 ***
## Processor.Speed..MHz. 0.21979
## SegmentGovernment 0.87063
## SegmentIndustry 0.31986
## SegmentResearch 0.52796
## SegmentVendor 0.62413
## Processor.TechnologyIntel Core 0.15916
## Processor.TechnologyIntel IvyBridge 0.00732 **
## Processor.TechnologyIntel Nehalem 0.05576 .
## Processor.TechnologyIntel SandyBridge 0.03334 *
## Processor.TechnologyOthers 0.52521
## Processor.TechnologyPower 0.29385
## Processor.TechnologyPowerPC 2.96e-05 ***
## Processor.TechnologySparc 0.83400
## OS.FamilyMixed 0.86550
## OS.FamilyUnix 0.72162
## ArchitectureMPP 0.01693 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.36029e+12)
##
## Null deviance: 4.4393e+15 on 267 degrees of freedom
## Residual deviance: 3.3599e+14 on 247 degrees of freedom
## (232 observations deleted due to missingness)
## AIC: 8270.3
##
## Number of Fisher Scoring iterations: 2
ggplot(top500.1,aes(x=Processor.Technology,y=Eff,col=Segment))+geom_boxplot()+geom_jitter() + theme(axis.text.x = element_text(size=10, angle=90))
Intel IvyBridge and PowerPC as Processor.Technology are the leaders in Rpeak acceleration especially for Academic and Research Segment.
But Total.Cores is a must for all! Let’s check it.
cor.test(Rpeak,Total.Cores)
##
## Pearson's product-moment correlation
##
## data: Rpeak and Total.Cores
## t = 65.9089, df = 498, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9373438 0.9555071
## sample estimates:
## cor
## 0.9471798
fit.lm=lm(log(Rpeak)~log(Total.Cores),data = top500.1)
summary(fit.lm)
##
## Call:
## lm(formula = log(Rpeak) ~ log(Total.Cores), data = top500.1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.54133 -0.12651 -0.01417 0.11844 1.58578
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.82397 0.19066 25.30 <2e-16 ***
## log(Total.Cores) 0.81436 0.01905 42.75 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.369 on 498 degrees of freedom
## Multiple R-squared: 0.7858, Adjusted R-squared: 0.7854
## F-statistic: 1827 on 1 and 498 DF, p-value: < 2.2e-16
ggplot(top500.1,aes(x=log(Total.Cores),y=log(Rpeak),col=Year))+geom_smooth(method="lm") + geom_point(aes(size=Eff))
fit.lm.year=lm(log(Rpeak)~log(Total.Cores)+Year,data = top500.1)
summary(fit.lm.year)
##
## Call:
## lm(formula = log(Rpeak) ~ log(Total.Cores) + Year, data = top500.1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.18183 -0.17802 -0.04129 0.07572 1.55894
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -215.72915 27.84930 -7.746 5.36e-14 ***
## log(Total.Cores) 0.84975 0.01852 45.891 < 2e-16 ***
## Year 0.10941 0.01381 7.920 1.57e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3481 on 497 degrees of freedom
## Multiple R-squared: 0.8098, Adjusted R-squared: 0.8091
## F-statistic: 1058 on 2 and 497 DF, p-value: < 2.2e-16
anova(fit.lm,fit.lm.year)
## Analysis of Variance Table
##
## Model 1: log(Rpeak) ~ log(Total.Cores)
## Model 2: log(Rpeak) ~ log(Total.Cores) + Year
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 498 67.808
## 2 497 60.210 1 7.5985 62.721 1.569e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(fit.lm,fit.lm.year)
## df AIC
## fit.lm 3 425.9753
## fit.lm.year 4 368.5507
Well, as far as above mentioned correlation is concerned the Year positevely matters too: compare two models - the second one fit.lm.year is significantly better. AIC confirms this too.
ggplot(top500.1,aes(x=log(Total.Cores),y=log(Rpeak),col=Segment))+geom_smooth(method="lm") +geom_jitter()
The Research segment proves to be the leader in correlation on the whole.
ggplot(top500.1,aes(log(Total.Cores),log(Rpeak),col=Segment)) +geom_point(aes(size=Eff),shape=22) + facet_grid(Segment~Continent) + theme(axis.text.x = element_text(size=12, angle=90)) + geom_smooth(method="lm")
Americas and Asia continents are very close in correlation as far as Research segment is concerned while Europe tries not to be outsider as Oceania. But in Academic segment Europe seems to be the leader. In the Government segment (DOD, DOE, NSA) Americas is the true leader.
ggplot(newdata2,aes(log(Total.Cores),log(Rpeak),col=Segment)) +geom_point(aes(size=Eff),shape=22) + facet_grid(Segment~Country) + theme(axis.text.x = element_text(size=12, angle=90))+ geom_smooth(method="lm")
## Warning in qt((1 - level)/2, df): qngd`m{ NaN
Unites States are the leader in correlation as far as all segments are concerned. Meanwhile France has got a real chance to be the leader in Industry and Research segments. Vive la France! China desperately tries to catch the leader in Research segment.
ggplot(newdata,aes(log(Total.Cores),log(Rpeak),col=Segment)) +geom_point(aes(size=Eff),shape=22) + facet_grid(Segment~Manufacturer) + theme(axis.text.x = element_text(size=12, angle=90)) + geom_smooth(method="lm")
## Warning in qt((1 - level)/2, df): qngd`m{ NaN
## Warning in qt((1 - level)/2, df): qngd`m{ NaN
## Warning in qt((1 - level)/2, df): qngd`m{ NaN
Dell and IBM are the leaders in Academic, while Cray Inc. is the leader in Government segment though Cray Inc., IBM and SGI are the leaders in Research segment while SGI has got a real chance to be the leader in Industry among HP and IBM as far as correlation is concerned.
And what about other combinations of factors?
ggplot(top500.1,aes(x=Interconnect.Family,y=Eff,col=Segment))+geom_boxplot()+geom_jitter() + theme(axis.text.x = element_text(size=10, angle=90))
They do matter too: Interconnect Family.
Can we find relationships between Segment, Manufacturer, Architecture, Processor.Technology and OS.Family which indicate some kind of a system in installation of supercomputers? This is called association rule learning - data mining technique. For more information about ARL - http://en.wikipedia.org/wiki/Association_rule_learning. Let’s try ARL!
First, we should reduce our data set, using previous results.
new.dat.2=select(newdata,Segment,Manufacturer,Architecture,Processor.Technology,OS.Family)
head(new.dat.2)
## Segment Manufacturer Architecture Processor.Technology OS.Family
## 2 Research Cray Inc. MPP AMD x86_64 Linux
## 3 Research IBM MPP PowerPC Linux
## 5 Research IBM MPP PowerPC Linux
## 6 Research Cray Inc. MPP Intel SandyBridge Linux
## 7 Academic Dell Cluster Intel SandyBridge Linux
## 8 Research IBM MPP PowerPC Linux
Now we are ready for generating Association Rules with reduced data.
rules.all=apriori(new.dat.2)
##
## Parameter specification:
## confidence minval smax arem aval originalSupport support minlen maxlen
## 0.8 0.1 1 none FALSE TRUE 0.1 1 10
## target ext
## rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## apriori - find association rules with the apriori algorithm
## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[26 item(s), 452 transaction(s)] done [0.00s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [69 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules.all)
## lhs rhs support confidence lift
## 1 {} => {Architecture=Cluster} 0.8451327 0.8451327 1.0000000
## 2 {} => {OS.Family=Linux} 0.9734513 0.9734513 1.0000000
## 3 {Manufacturer=Cray Inc.} => {OS.Family=Linux} 0.1128319 1.0000000 1.0272727
## 4 {Segment=Academic} => {OS.Family=Linux} 0.1371681 0.9841270 1.0109668
## 5 {Architecture=MPP} => {OS.Family=Linux} 0.1371681 0.8857143 0.9098701
## 6 {Segment=Research} => {OS.Family=Linux} 0.1769912 0.8988764 0.9233912
## 7 {Processor.Technology=Intel IvyBridge} => {Architecture=Cluster} 0.1814159 0.8913043 1.0546324
## 8 {Processor.Technology=Intel IvyBridge} => {OS.Family=Linux} 0.2035398 1.0000000 1.0272727
## 9 {Manufacturer=IBM} => {Architecture=Cluster} 0.3163717 0.8125000 0.9613874
## 10 {Manufacturer=IBM} => {OS.Family=Linux} 0.3650442 0.9375000 0.9630682
## 11 {Manufacturer=Hewlett-Packard} => {Segment=Industry} 0.3606195 0.9005525 1.4965063
## 12 {Manufacturer=Hewlett-Packard} => {Architecture=Cluster} 0.4004425 1.0000000 1.1832461
## 13 {Manufacturer=Hewlett-Packard} => {OS.Family=Linux} 0.3982301 0.9944751 1.0215972
## 14 {Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.5774336 0.9849057 1.1653858
## 15 {Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.5840708 0.9962264 1.0233962
## 16 {Segment=Industry} => {Architecture=Cluster} 0.5973451 0.9926471 1.1745457
## 17 {Segment=Industry} => {OS.Family=Linux} 0.5995575 0.9963235 1.0234960
## 18 {Architecture=Cluster} => {OS.Family=Linux} 0.8362832 0.9895288 1.0165159
## 19 {OS.Family=Linux} => {Architecture=Cluster} 0.8362832 0.8590909 1.0165159
## 20 {Segment=Research,
## Architecture=Cluster} => {OS.Family=Linux} 0.1061947 0.9411765 0.9668449
## 21 {Segment=Industry,
## Processor.Technology=Intel IvyBridge} => {Architecture=Cluster} 0.1150442 1.0000000 1.1832461
## 22 {Segment=Industry,
## Processor.Technology=Intel IvyBridge} => {OS.Family=Linux} 0.1150442 1.0000000 1.0272727
## 23 {Architecture=Cluster,
## Processor.Technology=Intel IvyBridge} => {OS.Family=Linux} 0.1814159 1.0000000 1.0272727
## 24 {Processor.Technology=Intel IvyBridge,
## OS.Family=Linux} => {Architecture=Cluster} 0.1814159 0.8913043 1.0546324
## 25 {Manufacturer=IBM,
## Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.2079646 1.0000000 1.1832461
## 26 {Manufacturer=IBM,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.2079646 1.0000000 1.0272727
## 27 {Segment=Industry,
## Manufacturer=IBM} => {Architecture=Cluster} 0.2256637 0.9902913 1.1717582
## 28 {Segment=Industry,
## Manufacturer=IBM} => {OS.Family=Linux} 0.2278761 1.0000000 1.0272727
## 29 {Manufacturer=IBM,
## Architecture=Cluster} => {OS.Family=Linux} 0.3097345 0.9790210 1.0057216
## 30 {Manufacturer=IBM,
## OS.Family=Linux} => {Architecture=Cluster} 0.3097345 0.8484848 1.0039664
## 31 {Manufacturer=Hewlett-Packard,
## Processor.Technology=Intel SandyBridge} => {Segment=Industry} 0.2787611 0.9130435 1.5172634
## 32 {Manufacturer=Hewlett-Packard,
## Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.3053097 1.0000000 1.1832461
## 33 {Manufacturer=Hewlett-Packard,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.3030973 0.9927536 1.0198287
## 34 {Segment=Industry,
## Manufacturer=Hewlett-Packard} => {Architecture=Cluster} 0.3606195 1.0000000 1.1832461
## 35 {Manufacturer=Hewlett-Packard,
## Architecture=Cluster} => {Segment=Industry} 0.3606195 0.9005525 1.4965063
## 36 {Segment=Industry,
## Manufacturer=Hewlett-Packard} => {OS.Family=Linux} 0.3584071 0.9938650 1.0209704
## 37 {Manufacturer=Hewlett-Packard,
## OS.Family=Linux} => {Segment=Industry} 0.3584071 0.9000000 1.4955882
## 38 {Manufacturer=Hewlett-Packard,
## Architecture=Cluster} => {OS.Family=Linux} 0.3982301 0.9944751 1.0215972
## 39 {Manufacturer=Hewlett-Packard,
## OS.Family=Linux} => {Architecture=Cluster} 0.3982301 1.0000000 1.1832461
## 40 {Segment=Industry,
## Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.4336283 1.0000000 1.1832461
## 41 {Segment=Industry,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.4314159 0.9948980 1.0220315
## 42 {Architecture=Cluster,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.5752212 0.9961686 1.0233368
## 43 {Processor.Technology=Intel SandyBridge,
## OS.Family=Linux} => {Architecture=Cluster} 0.5752212 0.9848485 1.1653181
## 44 {Segment=Industry,
## Architecture=Cluster} => {OS.Family=Linux} 0.5951327 0.9962963 1.0234680
## 45 {Segment=Industry,
## OS.Family=Linux} => {Architecture=Cluster} 0.5951327 0.9926199 1.1745136
## 46 {Segment=Industry,
## Architecture=Cluster,
## Processor.Technology=Intel IvyBridge} => {OS.Family=Linux} 0.1150442 1.0000000 1.0272727
## 47 {Segment=Industry,
## Processor.Technology=Intel IvyBridge,
## OS.Family=Linux} => {Architecture=Cluster} 0.1150442 1.0000000 1.1832461
## 48 {Segment=Industry,
## Manufacturer=IBM,
## Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.1504425 1.0000000 1.1832461
## 49 {Segment=Industry,
## Manufacturer=IBM,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.1504425 1.0000000 1.0272727
## 50 {Manufacturer=IBM,
## Architecture=Cluster,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.2079646 1.0000000 1.0272727
## 51 {Manufacturer=IBM,
## Processor.Technology=Intel SandyBridge,
## OS.Family=Linux} => {Architecture=Cluster} 0.2079646 1.0000000 1.1832461
## 52 {Segment=Industry,
## Manufacturer=IBM,
## Architecture=Cluster} => {OS.Family=Linux} 0.2256637 1.0000000 1.0272727
## 53 {Segment=Industry,
## Manufacturer=IBM,
## OS.Family=Linux} => {Architecture=Cluster} 0.2256637 0.9902913 1.1717582
## 54 {Segment=Industry,
## Manufacturer=Hewlett-Packard,
## Processor.Technology=Intel SandyBridge} => {Architecture=Cluster} 0.2787611 1.0000000 1.1832461
## 55 {Manufacturer=Hewlett-Packard,
## Architecture=Cluster,
## Processor.Technology=Intel SandyBridge} => {Segment=Industry} 0.2787611 0.9130435 1.5172634
## 56 {Segment=Industry,
## Manufacturer=Hewlett-Packard,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.2765487 0.9920635 1.0191198
## 57 {Manufacturer=Hewlett-Packard,
## Processor.Technology=Intel SandyBridge,
## OS.Family=Linux} => {Segment=Industry} 0.2765487 0.9124088 1.5162087
## 58 {Manufacturer=Hewlett-Packard,
## Architecture=Cluster,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.3030973 0.9927536 1.0198287
## 59 {Manufacturer=Hewlett-Packard,
## Processor.Technology=Intel SandyBridge,
## OS.Family=Linux} => {Architecture=Cluster} 0.3030973 1.0000000 1.1832461
## 60 {Segment=Industry,
## Manufacturer=Hewlett-Packard,
## Architecture=Cluster} => {OS.Family=Linux} 0.3584071 0.9938650 1.0209704
## 61 {Segment=Industry,
## Manufacturer=Hewlett-Packard,
## OS.Family=Linux} => {Architecture=Cluster} 0.3584071 1.0000000 1.1832461
## 62 {Manufacturer=Hewlett-Packard,
## Architecture=Cluster,
## OS.Family=Linux} => {Segment=Industry} 0.3584071 0.9000000 1.4955882
## 63 {Segment=Industry,
## Architecture=Cluster,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.4314159 0.9948980 1.0220315
## 64 {Segment=Industry,
## Processor.Technology=Intel SandyBridge,
## OS.Family=Linux} => {Architecture=Cluster} 0.4314159 1.0000000 1.1832461
## 65 {Segment=Industry,
## Manufacturer=IBM,
## Architecture=Cluster,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.1504425 1.0000000 1.0272727
## 66 {Segment=Industry,
## Manufacturer=IBM,
## Processor.Technology=Intel SandyBridge,
## OS.Family=Linux} => {Architecture=Cluster} 0.1504425 1.0000000 1.1832461
## 67 {Segment=Industry,
## Manufacturer=Hewlett-Packard,
## Architecture=Cluster,
## Processor.Technology=Intel SandyBridge} => {OS.Family=Linux} 0.2765487 0.9920635 1.0191198
## 68 {Segment=Industry,
## Manufacturer=Hewlett-Packard,
## Processor.Technology=Intel SandyBridge,
## OS.Family=Linux} => {Architecture=Cluster} 0.2765487 1.0000000 1.1832461
## 69 {Manufacturer=Hewlett-Packard,
## Architecture=Cluster,
## Processor.Technology=Intel SandyBridge,
## OS.Family=Linux} => {Segment=Industry} 0.2765487 0.9124088 1.5162087
plot(rules.all)
plot(rules.all,method="grouped")
plot(rules.all,method="graph")
plot(rules.all,method="graph",control=list(type="items"))
plot(rules.all,method="paracoord")
We can see very interesting graphical view of ARL in our case: Academic and Research look very individual in contrast to Industry where Linux, Cluster, IBM and Intel SandBridge accumulate main trend.
1.Industry shows very low effectiveness of supercomputers while Vendor and Research are true leaders.
2.Bull SA, Cray Inc. and SGI are the leaders among manufacturers of supercomputers.
3.MPP leads in effectiveness in comparison with Cluster architecture of supercomputers.
4.Americas and Europe are the leading continents in effectiveness of supercomputers.
5.Western Europe leads in Academic segment while Northern Europe does in Research by effectiveness of supercomputers.
6.United States, Germany, United Kingdom, France and Japan are the leaders in effectiveness of supercomputers while China and Russia miss this opportunity though Russia is closer to the leaders.
7.China as outsider has the most big values for effectiveness in Academic and Research while United States as a leader has the most big values for effectiveness in Academic, Government, Research and Vendor segments.
8.Rpeak, Accelerator.Co.Processor.Cores, Total.Cores, Rmax, Power are very correlated and constitute one component while Nmax and Processor.Speed..MHz are not correlated.
9.Intel IvyBridge and PowerPC as Processor.Technology are the leaders in Rpeak acceleration especially for Academic and Research Segment.
10.On the whole Segment, Country, Processor.Technology and OS.Family combined with Architecture pay their tribute in difference of effectiveness of supercomputers.
So we have made a data mining tour with R on a real data set and got some interesting and non-obvious results.This race would be nonstop forever.