This project deals with estimating the performence of Computer using the data source from https://archive.ics.uci.edu/ml/datasets/Computer+Hardware . The creator of the data set is Phillip Ein-Dor and Jacob Feldmesser(Faculty of Management, Tel Aviv University, Ramat-Avi Tel Aviv, 69978 ,Israel) . Data set contain 209 instanses having 9 attributes . This paper addresses the following issues concerning the “Performence of computer” with respect to the computer hardware. In this Paper We evaluate whether their is difference between the published relative performence and estimeted relative performence and estimating the relative performence of the computer using regression model .
Relative CPU Performance Data, described in terms of its cycle time, memory size, etc.
The estimated relative performance values were estimated by the authors using a linear regression method. See their article (pp 308-313) for more details on how the relative performance values were set.
getwd()
## [1] "C:/Users/Giridhar/Downloads"
setwd("C:/Users/Giridhar/Downloads")
data=read.csv("machine(1).csv")
data = data[3:10]
dim(data)
## [1] 209 8
View(data)
str(data)
## 'data.frame': 209 obs. of 8 variables:
## $ MYCT : int 125 29 29 29 29 26 23 23 23 23 ...
## $ MMIN : int 256 8000 8000 8000 8000 8000 16000 16000 16000 32000 ...
## $ MMAX : int 6000 32000 32000 32000 16000 32000 32000 32000 64000 64000 ...
## $ CACH : int 256 32 32 32 32 64 64 64 64 128 ...
## $ CHMIN: int 16 8 8 8 8 8 16 16 16 32 ...
## $ CHMAX: int 128 32 32 32 16 32 32 32 32 64 ...
## $ PRP : int 198 269 220 172 132 318 367 489 636 1144 ...
## $ ERP : int 199 253 253 253 132 290 381 381 749 1238 ...
cor(data)
## MYCT MMIN MMAX CACH CHMIN CHMAX
## MYCT 1.0000000 -0.3356422 -0.3785606 -0.3209998 -0.3010897 -0.2505023
## MMIN -0.3356422 1.0000000 0.7581573 0.5347291 0.5171892 0.2669074
## MMAX -0.3785606 0.7581573 1.0000000 0.5379898 0.5605134 0.5272462
## CACH -0.3209998 0.5347291 0.5379898 1.0000000 0.5822455 0.4878458
## CHMIN -0.3010897 0.5171892 0.5605134 0.5822455 1.0000000 0.5482812
## CHMAX -0.2505023 0.2669074 0.5272462 0.4878458 0.5482812 1.0000000
## PRP -0.3070994 0.7949313 0.8630041 0.6626414 0.6089033 0.6052093
## ERP -0.2883956 0.8192915 0.9012024 0.6486203 0.6105802 0.5921556
## PRP ERP
## MYCT -0.3070994 -0.2883956
## MMIN 0.7949313 0.8192915
## MMAX 0.8630041 0.9012024
## CACH 0.6626414 0.6486203
## CHMIN 0.6089033 0.6105802
## CHMAX 0.6052093 0.5921556
## PRP 1.0000000 0.9664717
## ERP 0.9664717 1.0000000
#SIMPLE LINEAR REGRESSION:
#Splitting the dataset into training and testing
library(caTools)
## Warning: package 'caTools' was built under R version 4.1.3
set.seed(123) # generates random number
split= sample(209,146) #spliting the data as training and testing in ratio of 0.7
#Train set
training_set = data[split,] # storing the training set values
dim(training_set) # Checking the dimension of training set
## [1] 146 8
View(training_set) # displaying training set
#Test set
test_set = data[-split,] # storing the testing set values
dim(test_set)# Checking the dimension of training set
## [1] 63 8
View(test_set) # displaying test set
#Fitting a model
regressor = lm(formula=ERP~.,data = training_set)
summary(regressor) # summary of the model
##
## Call:
## lm(formula = ERP ~ ., data = training_set)
##
## Residuals:
## Min 1Q Median 3Q Max
## -102.023 -11.814 1.758 14.116 121.806
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.081e+01 4.856e+00 -6.344 2.98e-09 ***
## MYCT 2.947e-02 9.070e-03 3.249 0.00146 **
## MMIN 1.107e-03 1.163e-03 0.952 0.34266
## MMAX 4.550e-03 4.019e-04 11.320 < 2e-16 ***
## CACH 1.921e-01 8.549e-02 2.247 0.02622 *
## CHMIN -4.054e-01 4.183e-01 -0.969 0.33405
## CHMAX 3.977e-01 1.365e-01 2.913 0.00417 **
## PRP 5.187e-01 3.669e-02 14.137 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 27.14 on 138 degrees of freedom
## Multiple R-squared: 0.9653, Adjusted R-squared: 0.9636
## F-statistic: 548.9 on 7 and 138 DF, p-value: < 2.2e-16
plot(regressor) # line chart of regressor
#Prediction
predi = predict(object = regressor, newdata=test_set)
predi
## 2 3 10 11 12 15
## 279.6544502 254.2382604 926.9397909 15.8325721 20.9639029 -13.7633387
## 18 19 20 28 29 31
## 21.1307709 4.0251619 112.4472525 8.1120737 44.2254297 155.2651475
## 33 44 45 47 49 57
## 48.2861295 79.5022095 71.4443260 6.8187861 38.6830801 53.4684427
## 58 59 61 65 66 73
## 36.8538877 37.8912832 38.9286787 148.1049988 220.0499925 3.4687337
## 77 80 88 95 101 102
## 31.8343349 86.5878482 20.9677033 221.2166612 -0.5907728 28.8392846
## 105 107 114 115 125 126
## 5.1101381 38.3929239 16.5748227 42.5581749 13.0859464 6.2504713
## 129 130 132 134 136 138
## 88.5737627 93.7607402 19.5158047 32.0200571 82.2244804 55.5993306
## 139 140 142 148 150 152
## 40.9865717 15.7062477 49.7626399 119.7077469 68.2092234 287.0938586
## 154 156 157 158 160 161
## 428.9807484 312.4511155 450.1528416 -16.5189132 0.9441343 14.0210357
## 163 167 174 180 181 182
## 32.0778422 83.6914232 10.6676732 50.7673031 79.9665164 -13.1857115
## 183 184 193
## -5.2469803 7.6098997 362.0639734
# displaying prediction
plot(test_set$ERP,predi) #line chart of predition of actual vs prediction
#Calculating RSME
val=sqrt(sum(predi-test_set$ERP)^2)/length(test_set$ERP)
val
## [1] 7.634874
#Adjusted R square value
R2=summary(regressor)$r.squared
R2
## [1] 0.9653284
#MULTIPLE LINEAR REGRESSION
#Fitting a model
regressor = lm(formula=ERP~PRP,data = training_set)
summary(regressor) # summary of the model
##
## Call:
## lm(formula = ERP ~ PRP, data = training_set)
##
## Residuals:
## Min 1Q Median 3Q Max
## -230.116 -11.425 -0.413 9.000 179.279
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.85115 3.89424 1.503 0.135
## PRP 0.88659 0.02093 42.357 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.9 on 144 degrees of freedom
## Multiple R-squared: 0.9257, Adjusted R-squared: 0.9252
## F-statistic: 1794 on 1 and 144 DF, p-value: < 2.2e-16
plot(regressor) # line chart of regressor
#Prediction
predi = predict(object = regressor, newdata=test_set)
predi
## 2 3 10 11 12 15 18
## 244.34346 200.90062 1020.10842 39.54151 41.31469 14.71703 30.67563
## 19 20 28 29 31 33 44
## 33.33539 112.24177 29.78904 74.11846 248.77640 34.22198 59.04646
## 45 47 49 57 58 59 61
## 69.68552 21.80974 41.31469 68.79893 16.49021 18.26339 20.03656
## 65 66 73 77 80 88 95
## 133.51990 235.47758 25.35610 45.74763 80.32458 36.88175 215.97263
## 101 102 105 107 114 115 125
## 27.12927 45.74763 20.03656 34.22198 35.99516 50.18057 23.58292
## 126 129 130 132 134 136 138
## 29.78904 67.91234 76.77823 20.03656 34.22198 53.72693 32.44880
## 139 140 142 148 150 152 154
## 50.18057 41.31469 59.04646 104.26247 98.94294 251.43617 458.01130
## 156 157 158 160 161 163 167
## 294.87901 458.01130 12.94386 20.92315 24.46951 35.99516 108.69542
## 174 180 181 182 183 184 193
## 28.01586 59.04646 102.48930 11.17068 15.60362 25.35610 364.91950
# displaying prediction
plot(test_set$ERP,predi) #line chart of predition of actual vs prediction
#Calculating RSME
val=sqrt(sum(predi-test_set$ERP)^2)/length(test_set$ERP)
val
## [1] 0.5449456
#Adjusted R square value
R2=summary(regressor)$r.squared
R2
## [1] 0.9257015
#Visualizing initial data
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
ggplot(data, aes(x=factor(CACH), y=PRP)) + geom_boxplot(aes(fill = CACH))+labs(title = "20MID0034")
ggplot(data,aes(x=PRP)) + geom_histogram(bins=30) +facet_wrap(~MYCT)+labs(title = "20MID0034")
#Scatterplot using ggplot2
ggplot(data, aes(x=MYCT, y=MMIN, col=PRP))+geom_point()+labs(title = "20MID0034")
``Note that theecho = FALSE` parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.