Data source

URL : https://archive.ics.uci.edu/ml/machine-learning-databases/cpu-performance/

  1. Title: Relative CPU Performance Data

  2. Source Information – Creators: Phillip Ein-Dor and Jacob Feldmesser – Ein-Dor: Faculty of Management; Tel Aviv University; Ramat-Aviv; Tel Aviv, 69978; Israel – Donor: David W. Aha (aha@ics.uci.edu) (714) 856-8779
    – Date: October, 1987

  3. Past Usage:
    1. Ein-Dor and Feldmesser (CACM 4/87, pp 308-317) – Results: – linear regression prediction of relative cpu performance – Recorded 34% average deviation from actual values
    2. Kibler,D. & Aha,D. (1988). Instance-Based Prediction of Real-Valued Attributes. In Proceedings of the CSCSI (Canadian AI) Conference. – Results: – instance-based prediction of relative cpu performance – similar results; no transformations required
    • Predicted attribute: cpu relative performance (numeric)
  4. Relevant Information: – The estimated relative performance values were estimated by the authors using a linear regression method. See their article (pp 308-313) for more details on how the relative performance values were set.

  5. Number of Instances: 209

  6. Number of Attributes: 10 (6 predictive attributes, 2 non-predictive, 1 goal field, and the linear regression’s guess)

  7. Attribute Information:
    1. vendor name: 30 (adviser, amdahl,apollo, basf, bti, burroughs, c.r.d, cambex, cdc, dec, dg, formation, four-phase, gould, honeywell, hp, ibm, ipl, magnuson, microdata, nas, ncr, nixdorf, perkin-elmer, prime, siemens, sperry, sratus, wang)
    2. Model Name: many unique symbols
    3. MYCT: machine cycle time in nanoseconds (integer)
    4. MMIN: minimum main memory in kilobytes (integer)
    5. MMAX: maximum main memory in kilobytes (integer)
    6. CACH: cache memory in kilobytes (integer)
    7. CHMIN: minimum channels in units (integer)
    8. CHMAX: maximum channels in units (integer)
    9. PRP: published relative performance (integer)
  8. ERP: estimated relative performance from the original article (integer)

df <- read.csv("./data/machine.data")
names(df) <- c("vendor", "model", "myct", "mmin", "mmax",
               "cach", "chmin", "chmax", "prp", "erp")
summary(df)
##        vendor           model          myct             mmin      
##  ibm      : 32   100       :  1   Min.   :  17.0   Min.   :   64  
##  nas      : 19   1100/61-h1:  1   1st Qu.:  50.0   1st Qu.:  768  
##  honeywell: 13   1100/81   :  1   Median : 110.0   Median : 2000  
##  ncr      : 13   1100/82   :  1   Mean   : 204.2   Mean   : 2881  
##  sperry   : 13   1100/83   :  1   3rd Qu.: 225.0   3rd Qu.: 4000  
##  siemens  : 12   1100/84   :  1   Max.   :1500.0   Max.   :32000  
##  (Other)  :106   (Other)   :202                                   
##       mmax            cach           chmin            chmax       
##  Min.   :   64   Min.   :  0.0   Min.   : 0.000   Min.   :  0.00  
##  1st Qu.: 4000   1st Qu.:  0.0   1st Qu.: 1.000   1st Qu.:  5.00  
##  Median : 8000   Median :  8.0   Median : 2.000   Median :  8.00  
##  Mean   :11824   Mean   : 24.1   Mean   : 4.644   Mean   : 17.74  
##  3rd Qu.:16000   3rd Qu.: 32.0   3rd Qu.: 6.000   3rd Qu.: 24.00  
##  Max.   :64000   Max.   :256.0   Max.   :52.000   Max.   :176.00  
##                                                                   
##       prp              erp         
##  Min.   :   6.0   Min.   :  15.00  
##  1st Qu.:  27.0   1st Qu.:  28.00  
##  Median :  49.5   Median :  45.00  
##  Mean   : 105.2   Mean   :  98.85  
##  3rd Qu.: 111.5   3rd Qu.:  99.50  
##  Max.   :1150.0   Max.   :1238.00  
## 
plot_ly(df, x=~log(prp), y=~log(myct), z=~log(chmax),
        type="scatter3d", mode="markers", color=~mmax)