1. Introduction

This project deals with estimating the performence of Computer using the data source from https://archive.ics.uci.edu/ml/datasets/Computer+Hardware . The creator of the data set is Phillip Ein-Dor and Jacob Feldmesser(Faculty of Management, Tel Aviv University, Ramat-Avi Tel Aviv, 69978 ,Israel) . Data set contain 209 instanses having 9 attributes .

This paper addresses the following issues concerning the “Performence of computer” with respect to the computer hardware. In this Paper We evaluate whether their is difference between the published relative performence and estimeted relative performence and estimating the relative performence of the computer using regression model .

Read the data

 setwd("C:/Users/Prabha Shankar/Desktop/Winter Internship/Project")
var1.df <- read.table("machine.data", header = FALSE , sep = ",")
colnames(var1.df) <- c("v_name","M_name","MYCT","MMIN","MMAX","CACH","CHMIN","CHMAX","PRP","ERP")

Dimension of Data

dim(var1.df)

## [1] 209  10

Frequency of all the vendors

table(var1.df$v_name)

## 
##      adviser       amdahl       apollo         basf          bti 
##            1            9            2            2            2 
##    burroughs        c.r.d       cambex          cdc          dec 
##            8            6            5            9            6 
##           dg    formation   four-phase        gould       harris 
##            7            5            1            3            7 
##    honeywell           hp          ibm          ipl     magnuson 
##           13            7           32            6            6 
##    microdata          nas          ncr      nixdorf perkin-elmer 
##            1           19           13            3            3 
##        prime      siemens       sperry       sratus         wang 
##            5           12           13            1            2

library(psych)

## Warning: package 'psych' was built under R version 3.3.3

Mean , Median And Standard Deviation of Variables

describe(var1.df)[(3:10),c(1:5)]

##       vars   n     mean       sd median
## MYCT     3 209   203.82   260.26    110
## MMIN     4 209  2867.98  3878.74   2000
## MMAX     5 209 11796.15 11726.56   8000
## CACH     6 209    25.21    40.63      8
## CHMIN    7 209     4.70     6.82      2
## CHMAX    8 209    18.27    26.00      8
## PRP      9 209   105.62   160.83     50
## ERP     10 209    99.33   154.76     45

boxplot(var1.df$MYCT, data=var1.df, main="Percentage of machine cycle ", xlab="Percentage of machine cycle", ylab="", horizontal=TRUE )

boxplot(var1.df$CHMAX, data=var1.df, main="Percentage of minimum main memory ", xlab="Percentage of Menimum memory", ylab="", horizontal=TRUE )

boxplot(var1.df$MMAX, data=var1.df, main="Percentage of maximum main memory ", xlab="Percentage of Maximum main memory", ylab="", horizontal=TRUE )

boxplot(var1.df$CACH, data=var1.df, main="Percentage of cach memory ", xlab="Percentage of cach memory", ylab="", horizontal=TRUE )

boxplot(var1.df$CHMIN, data=var1.df, main="Percentage of minimum channels ", xlab="Percentage of Minimum Channels", ylab="", horizontal=TRUE )

boxplot(var1.df$CHMAX, data=var1.df, main="Percentage of maximum channels ", xlab="Percentage of Maximum Channels", ylab="", horizontal=TRUE )

boxplot(var1.df$PRP, data=var1.df, main="Percentage of published relative performence ", xlab="Percentage of published relative performence", ylab="", horizontal=TRUE )

hist(var1.df$PRP,xlab="Published Relative Performence" ,main="Distribution of PRP")

The PRP Value is continuously valued .

Corelation TEst

library(car)

## Warning: package 'car' was built under R version 3.3.3

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

corelation Test between minimum main memory and estimated relative performence

cor.test(var1.df$MMIN,var1.df$ERP)

## 
##  Pearson's product-moment correlation
## 
## data:  var1.df$MMIN and var1.df$ERP
## t = 20.558, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7690922 0.8594446
## sample estimates:
##       cor 
## 0.8192915

Graphical representation

scatterplot(var1.df$MMIN,var1.df$ERP)

corelation Test between maximum main memory and estimated relative performence

cor.test(var1.df$MMAX,var1.df$ERP)

## 
##  Pearson's product-moment correlation
## 
## data:  var1.df$MMAX and var1.df$ERP
## t = 29.917, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8721582 0.9239162
## sample estimates:
##       cor 
## 0.9012024

Graphical Representation

scatterplot(var1.df$MMAX,var1.df$ERP)

corelation Test between cach memory and estimated relative performence

cor.test(var1.df$CACH,var1.df$ERP)

## 
##  Pearson's product-moment correlation
## 
## data:  var1.df$CACH and var1.df$ERP
## t = 12.261, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5624133 0.7208780
## sample estimates:
##       cor 
## 0.6486203

Graphical Representation

scatterplot(var1.df$CACH,var1.df$ERP)

corelation Test between minimum channels and estimated relative performence

cor.test(var1.df$CHMIN,var1.df$ERP)

## 
##  Pearson's product-moment correlation
## 
## data:  var1.df$CHMIN and var1.df$ERP
## t = 11.092, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5177705 0.6891857
## sample estimates:
##       cor 
## 0.6105802

Graphical Representation

scatterplot(var1.df$CHMIN,var1.df$ERP)

corelation Test between maximum channels and estimated relative performence

cor.test(var1.df$CHMAX,var1.df$ERP)

## 
##  Pearson's product-moment correlation
## 
## data:  var1.df$CHMAX and var1.df$ERP
## t = 10.573, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4963279 0.6737267
## sample estimates:
##       cor 
## 0.5921556

Graphical Representation

scatterplot(var1.df$CHMAX,var1.df$ERP)

corelation Test between published relative performence and estimated relative performence .

cor.test(var1.df$PRP,var1.df$ERP)

## 
##  Pearson's product-moment correlation
## 
## data:  var1.df$PRP and var1.df$ERP
## t = 54.153, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9561728 0.9743821
## sample estimates:
##       cor 
## 0.9664717

Graphical Representation

scatterplot(var1.df$PRP,var1.df$ERP)

value of corelation between variables.

round(cor(var1.df[, 3:10], use="pair"),2)

##        MYCT  MMIN  MMAX  CACH CHMIN CHMAX   PRP   ERP
## MYCT   1.00 -0.34 -0.38 -0.32 -0.30 -0.25 -0.31 -0.29
## MMIN  -0.34  1.00  0.76  0.53  0.52  0.27  0.79  0.82
## MMAX  -0.38  0.76  1.00  0.54  0.56  0.53  0.86  0.90
## CACH  -0.32  0.53  0.54  1.00  0.58  0.49  0.66  0.65
## CHMIN -0.30  0.52  0.56  0.58  1.00  0.55  0.61  0.61
## CHMAX -0.25  0.27  0.53  0.49  0.55  1.00  0.61  0.59
## PRP   -0.31  0.79  0.86  0.66  0.61  0.61  1.00  0.97
## ERP   -0.29  0.82  0.90  0.65  0.61  0.59  0.97  1.00

Graphical Representation.

scatterplot.matrix(~ERP+MYCT+MMIN+MMAX+CACH+CHMIN+CHMAX+PRP, data=var1.df)

## Warning: 'scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").

T_Test

Null Hypothesis (H0) : There is no difference between Published Relative Performence and Estimated Relative performence (ERP) .

t.test(var1.df$PRP,var1.df$ERP,var.equal = TRUE ,paired = FALSE)

## 
##  Two Sample t-test
## 
## data:  var1.df$PRP and var1.df$ERP
## t = 0.40754, df = 416, p-value = 0.6838
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -24.05585  36.63958
## sample estimates:
## mean of x mean of y 
## 105.62201  99.33014

Since the p-value is significantly larger than 0.05 , therfore we will accept the null hypothesis and reject the alternative hypothesis . Hence there is no difference between PRP and ERP .

T_TEst

Null Hypothesis (H0) : value of estimated relative performence of computer not depends on anoter factor

t.test(var1.df$ERP,var.equal = TRUE ,paired = FALSE)

## 
##  One Sample t-test
## 
## data:  var1.df$ERP
## t = 9.2791, df = 208, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   78.22638 120.43390
## sample estimates:
## mean of x 
##  99.33014

Since the p-value is significantly less than 0.05, therfore we will reject he null hypothesis . Hence the estimateed relative performence (ERP) depends on seeral factors . To find that we will perform regression analysis .

REGRESSION

Formulating multivariate linear regression model to ???t Performence of computer with respect to the model selection .

IndependentVariables: {MYCT,MMIN,MMAX,CACH,CHMIN,CHMAX,PRP}

Dependent Variable: ERP

Model <- ERP ~ MYCT+MMIN+MMAX+CACH+CHMIN+CHMAX+PRP
  fit <- lm(Model, data = var1.df) 
  summary(fit)

## 
## Call:
## lm(formula = Model, data = var1.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -117.478   -9.546    2.864   15.257  182.251 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.423e+01  4.732e+00  -7.234 9.68e-12 ***
## MYCT         3.777e-02  9.434e-03   4.004 8.77e-05 ***
## MMIN         5.483e-03  1.120e-03   4.894 2.02e-06 ***
## MMAX         3.375e-03  3.974e-04   8.493 4.45e-15 ***
## CACH         1.244e-01  7.751e-02   1.605  0.11016    
## CHMIN       -1.634e-02  4.523e-01  -0.036  0.97122    
## CHMAX        3.458e-01  1.287e-01   2.687  0.00781 ** 
## PRP          5.770e-01  3.718e-02  15.519  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 31.7 on 201 degrees of freedom
## Multiple R-squared:  0.9595, Adjusted R-squared:  0.958 
## F-statistic: 679.5 on 7 and 201 DF,  p-value: < 2.2e-16

Result

There is not that much difference between the Published relative performence and estimated relative performence .

2.The model we get for estimating the ERP is .

Model for Estimating The performence of Computer

Prabha Shankar

December 27, 2017

1. Introduction

Read the data

Dimension of Data

Frequency of all the vendors

Mean , Median And Standard Deviation of Variables

Corelation TEst

corelation Test between minimum main memory and estimated relative performence

Graphical representation

corelation Test between maximum main memory and estimated relative performence

Graphical Representation

corelation Test between cach memory and estimated relative performence

Graphical Representation

corelation Test between minimum channels and estimated relative performence

Graphical Representation

corelation Test between maximum channels and estimated relative performence

Graphical Representation

corelation Test between published relative performence and estimated relative performence .

Graphical Representation

value of corelation between variables.

Graphical Representation.

T_Test

T_TEst

REGRESSION

Formulating multivariate linear regression model to ???t Performence of computer with respect to the model selection .

IndependentVariables: {MYCT,MMIN,MMAX,CACH,CHMIN,CHMAX,PRP}

Dependent Variable: ERP

Result

ERP=a0+a1MYCT+a2MMIN+a3CACH+a4CHMIN+a5CHMAX+a6PRP+??.