This project deals with estimating the performence of Computer using the data source from https://archive.ics.uci.edu/ml/datasets/Computer+Hardware . The creator of the data set is Phillip Ein-Dor and Jacob Feldmesser(Faculty of Management, Tel Aviv University, Ramat-Avi Tel Aviv, 69978 ,Israel) . Data set contain 209 instanses having 9 attributes .
This paper addresses the following issues concerning the “Performence of computer” with respect to the computer hardware. In this Paper We evaluate whether their is difference between the published relative performence and estimeted relative performence and estimating the relative performence of the computer using regression model .
setwd("C:/Users/Prabha Shankar/Desktop/Winter Internship/Project")
var1.df <- read.table("machine.data", header = FALSE , sep = ",")
colnames(var1.df) <- c("v_name","M_name","MYCT","MMIN","MMAX","CACH","CHMIN","CHMAX","PRP","ERP")
dim(var1.df)
## [1] 209 10
table(var1.df$v_name)
##
## adviser amdahl apollo basf bti
## 1 9 2 2 2
## burroughs c.r.d cambex cdc dec
## 8 6 5 9 6
## dg formation four-phase gould harris
## 7 5 1 3 7
## honeywell hp ibm ipl magnuson
## 13 7 32 6 6
## microdata nas ncr nixdorf perkin-elmer
## 1 19 13 3 3
## prime siemens sperry sratus wang
## 5 12 13 1 2
library(psych)
## Warning: package 'psych' was built under R version 3.3.3
describe(var1.df)[(3:10),c(1:5)]
## vars n mean sd median
## MYCT 3 209 203.82 260.26 110
## MMIN 4 209 2867.98 3878.74 2000
## MMAX 5 209 11796.15 11726.56 8000
## CACH 6 209 25.21 40.63 8
## CHMIN 7 209 4.70 6.82 2
## CHMAX 8 209 18.27 26.00 8
## PRP 9 209 105.62 160.83 50
## ERP 10 209 99.33 154.76 45
boxplot(var1.df$MYCT, data=var1.df, main="Percentage of machine cycle ", xlab="Percentage of machine cycle", ylab="", horizontal=TRUE )
boxplot(var1.df$CHMAX, data=var1.df, main="Percentage of minimum main memory ", xlab="Percentage of Menimum memory", ylab="", horizontal=TRUE )
boxplot(var1.df$MMAX, data=var1.df, main="Percentage of maximum main memory ", xlab="Percentage of Maximum main memory", ylab="", horizontal=TRUE )
boxplot(var1.df$CACH, data=var1.df, main="Percentage of cach memory ", xlab="Percentage of cach memory", ylab="", horizontal=TRUE )
boxplot(var1.df$CHMIN, data=var1.df, main="Percentage of minimum channels ", xlab="Percentage of Minimum Channels", ylab="", horizontal=TRUE )
boxplot(var1.df$CHMAX, data=var1.df, main="Percentage of maximum channels ", xlab="Percentage of Maximum Channels", ylab="", horizontal=TRUE )
boxplot(var1.df$PRP, data=var1.df, main="Percentage of published relative performence ", xlab="Percentage of published relative performence", ylab="", horizontal=TRUE )
hist(var1.df$PRP,xlab="Published Relative Performence" ,main="Distribution of PRP")
The PRP Value is continuously valued .
library(car)
## Warning: package 'car' was built under R version 3.3.3
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
cor.test(var1.df$MMIN,var1.df$ERP)
##
## Pearson's product-moment correlation
##
## data: var1.df$MMIN and var1.df$ERP
## t = 20.558, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7690922 0.8594446
## sample estimates:
## cor
## 0.8192915
scatterplot(var1.df$MMIN,var1.df$ERP)
cor.test(var1.df$MMAX,var1.df$ERP)
##
## Pearson's product-moment correlation
##
## data: var1.df$MMAX and var1.df$ERP
## t = 29.917, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8721582 0.9239162
## sample estimates:
## cor
## 0.9012024
scatterplot(var1.df$MMAX,var1.df$ERP)
cor.test(var1.df$CACH,var1.df$ERP)
##
## Pearson's product-moment correlation
##
## data: var1.df$CACH and var1.df$ERP
## t = 12.261, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5624133 0.7208780
## sample estimates:
## cor
## 0.6486203
scatterplot(var1.df$CACH,var1.df$ERP)
cor.test(var1.df$CHMIN,var1.df$ERP)
##
## Pearson's product-moment correlation
##
## data: var1.df$CHMIN and var1.df$ERP
## t = 11.092, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5177705 0.6891857
## sample estimates:
## cor
## 0.6105802
scatterplot(var1.df$CHMIN,var1.df$ERP)
cor.test(var1.df$CHMAX,var1.df$ERP)
##
## Pearson's product-moment correlation
##
## data: var1.df$CHMAX and var1.df$ERP
## t = 10.573, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4963279 0.6737267
## sample estimates:
## cor
## 0.5921556
scatterplot(var1.df$CHMAX,var1.df$ERP)
cor.test(var1.df$PRP,var1.df$ERP)
##
## Pearson's product-moment correlation
##
## data: var1.df$PRP and var1.df$ERP
## t = 54.153, df = 207, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9561728 0.9743821
## sample estimates:
## cor
## 0.9664717
scatterplot(var1.df$PRP,var1.df$ERP)
round(cor(var1.df[, 3:10], use="pair"),2)
## MYCT MMIN MMAX CACH CHMIN CHMAX PRP ERP
## MYCT 1.00 -0.34 -0.38 -0.32 -0.30 -0.25 -0.31 -0.29
## MMIN -0.34 1.00 0.76 0.53 0.52 0.27 0.79 0.82
## MMAX -0.38 0.76 1.00 0.54 0.56 0.53 0.86 0.90
## CACH -0.32 0.53 0.54 1.00 0.58 0.49 0.66 0.65
## CHMIN -0.30 0.52 0.56 0.58 1.00 0.55 0.61 0.61
## CHMAX -0.25 0.27 0.53 0.49 0.55 1.00 0.61 0.59
## PRP -0.31 0.79 0.86 0.66 0.61 0.61 1.00 0.97
## ERP -0.29 0.82 0.90 0.65 0.61 0.59 0.97 1.00
scatterplot.matrix(~ERP+MYCT+MMIN+MMAX+CACH+CHMIN+CHMAX+PRP, data=var1.df)
## Warning: 'scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").
Null Hypothesis (H0) : There is no difference between Published Relative Performence and Estimated Relative performence (ERP) .
t.test(var1.df$PRP,var1.df$ERP,var.equal = TRUE ,paired = FALSE)
##
## Two Sample t-test
##
## data: var1.df$PRP and var1.df$ERP
## t = 0.40754, df = 416, p-value = 0.6838
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -24.05585 36.63958
## sample estimates:
## mean of x mean of y
## 105.62201 99.33014
Since the p-value is significantly larger than 0.05 , therfore we will accept the null hypothesis and reject the alternative hypothesis . Hence there is no difference between PRP and ERP .
Null Hypothesis (H0) : value of estimated relative performence of computer not depends on anoter factor
t.test(var1.df$ERP,var.equal = TRUE ,paired = FALSE)
##
## One Sample t-test
##
## data: var1.df$ERP
## t = 9.2791, df = 208, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 78.22638 120.43390
## sample estimates:
## mean of x
## 99.33014
Since the p-value is significantly less than 0.05, therfore we will reject he null hypothesis . Hence the estimateed relative performence (ERP) depends on seeral factors . To find that we will perform regression analysis .
Model <- ERP ~ MYCT+MMIN+MMAX+CACH+CHMIN+CHMAX+PRP
fit <- lm(Model, data = var1.df)
summary(fit)
##
## Call:
## lm(formula = Model, data = var1.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -117.478 -9.546 2.864 15.257 182.251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.423e+01 4.732e+00 -7.234 9.68e-12 ***
## MYCT 3.777e-02 9.434e-03 4.004 8.77e-05 ***
## MMIN 5.483e-03 1.120e-03 4.894 2.02e-06 ***
## MMAX 3.375e-03 3.974e-04 8.493 4.45e-15 ***
## CACH 1.244e-01 7.751e-02 1.605 0.11016
## CHMIN -1.634e-02 4.523e-01 -0.036 0.97122
## CHMAX 3.458e-01 1.287e-01 2.687 0.00781 **
## PRP 5.770e-01 3.718e-02 15.519 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 31.7 on 201 degrees of freedom
## Multiple R-squared: 0.9595, Adjusted R-squared: 0.958
## F-statistic: 679.5 on 7 and 201 DF, p-value: < 2.2e-16
2.The model we get for estimating the ERP is .