Develop and create a suitable machine learning model for the dataset assigned. Compare the results with atleast 2 machine learning algorithm.

INTRODUCTION:

This project deals with estimating the performence of Computer using the data source from https://archive.ics.uci.edu/ml/datasets/Computer+Hardware . The creator of the data set is Phillip Ein-Dor and Jacob Feldmesser(Faculty of Management, Tel Aviv University, Ramat-Avi Tel Aviv, 69978 ,Israel) . Data set contain 209 instanses having 9 attributes . This paper addresses the following issues concerning the “Performence of computer” with respect to the computer hardware. In this Paper We evaluate whether their is difference between the published relative performence and estimeted relative performence and estimating the relative performence of the computer using regression model .

Aim/ Objective:

Relative CPU Performance Data, described in terms of its cycle time, memory size, etc.

Literature Survey of your dataset:

The estimated relative performance values were estimated by the authors using a linear regression method. See their article (pp 308-313) for more details on how the relative performance values were set.

Attribute Information:

vendor name: 30 (adviser, amdahl,apollo, basf, bti, burroughs, c.r.d, cambex, cdc, dec, dg, formation, four-phase, gould, honeywell, hp, ibm, ipl, magnuson, microdata, nas, ncr, nixdorf, perkin-elmer, prime, siemens, sperry, sratus, wang)
Model Name: many unique symbols
MYCT: machine cycle time in nanoseconds (integer)
MMIN: minimum main memory in kilobytes (integer)
MMAX: maximum main memory in kilobytes (integer)
CACH: cache memory in kilobytes (integer)
CHMIN: minimum channels in units (integer)
CHMAX: maximum channels in units (integer)
PRP: published relative performance (integer)
ERP: estimated relative performance from the original article (integer)

CODE:

getwd()

## [1] "C:/Users/Giridhar/Downloads"

setwd("C:/Users/Giridhar/Downloads")
data=read.csv("machine(1).csv")
data = data[3:10]
dim(data)

## [1] 209   8

View(data)
str(data)

## 'data.frame':    209 obs. of  8 variables:
##  $ MYCT : int  125 29 29 29 29 26 23 23 23 23 ...
##  $ MMIN : int  256 8000 8000 8000 8000 8000 16000 16000 16000 32000 ...
##  $ MMAX : int  6000 32000 32000 32000 16000 32000 32000 32000 64000 64000 ...
##  $ CACH : int  256 32 32 32 32 64 64 64 64 128 ...
##  $ CHMIN: int  16 8 8 8 8 8 16 16 16 32 ...
##  $ CHMAX: int  128 32 32 32 16 32 32 32 32 64 ...
##  $ PRP  : int  198 269 220 172 132 318 367 489 636 1144 ...
##  $ ERP  : int  199 253 253 253 132 290 381 381 749 1238 ...

cor(data)

##             MYCT       MMIN       MMAX       CACH      CHMIN      CHMAX
## MYCT   1.0000000 -0.3356422 -0.3785606 -0.3209998 -0.3010897 -0.2505023
## MMIN  -0.3356422  1.0000000  0.7581573  0.5347291  0.5171892  0.2669074
## MMAX  -0.3785606  0.7581573  1.0000000  0.5379898  0.5605134  0.5272462
## CACH  -0.3209998  0.5347291  0.5379898  1.0000000  0.5822455  0.4878458
## CHMIN -0.3010897  0.5171892  0.5605134  0.5822455  1.0000000  0.5482812
## CHMAX -0.2505023  0.2669074  0.5272462  0.4878458  0.5482812  1.0000000
## PRP   -0.3070994  0.7949313  0.8630041  0.6626414  0.6089033  0.6052093
## ERP   -0.2883956  0.8192915  0.9012024  0.6486203  0.6105802  0.5921556
##              PRP        ERP
## MYCT  -0.3070994 -0.2883956
## MMIN   0.7949313  0.8192915
## MMAX   0.8630041  0.9012024
## CACH   0.6626414  0.6486203
## CHMIN  0.6089033  0.6105802
## CHMAX  0.6052093  0.5921556
## PRP    1.0000000  0.9664717
## ERP    0.9664717  1.0000000

SIMPLE LINEAR REGRESSION:

#SIMPLE LINEAR REGRESSION:
#Splitting the dataset into training and testing
library(caTools)

## Warning: package 'caTools' was built under R version 4.1.3

set.seed(123) # generates random number
split= sample(209,146) #spliting the data as training and testing in ratio of 0.7
#Train set
training_set = data[split,] # storing the training set values
dim(training_set) # Checking the dimension of training set

## [1] 146   8

View(training_set) # displaying training set
#Test set
test_set = data[-split,] # storing the testing set values
dim(test_set)# Checking the dimension of training set

## [1] 63  8

View(test_set) # displaying test set
#Fitting a model
regressor = lm(formula=ERP~.,data = training_set)
summary(regressor) # summary of the model

## 
## Call:
## lm(formula = ERP ~ ., data = training_set)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -102.023  -11.814    1.758   14.116  121.806 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.081e+01  4.856e+00  -6.344 2.98e-09 ***
## MYCT         2.947e-02  9.070e-03   3.249  0.00146 ** 
## MMIN         1.107e-03  1.163e-03   0.952  0.34266    
## MMAX         4.550e-03  4.019e-04  11.320  < 2e-16 ***
## CACH         1.921e-01  8.549e-02   2.247  0.02622 *  
## CHMIN       -4.054e-01  4.183e-01  -0.969  0.33405    
## CHMAX        3.977e-01  1.365e-01   2.913  0.00417 ** 
## PRP          5.187e-01  3.669e-02  14.137  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.14 on 138 degrees of freedom
## Multiple R-squared:  0.9653, Adjusted R-squared:  0.9636 
## F-statistic: 548.9 on 7 and 138 DF,  p-value: < 2.2e-16

plot(regressor) # line chart of regressor

#Prediction
predi = predict(object = regressor, newdata=test_set)
predi

##           2           3          10          11          12          15 
## 279.6544502 254.2382604 926.9397909  15.8325721  20.9639029 -13.7633387 
##          18          19          20          28          29          31 
##  21.1307709   4.0251619 112.4472525   8.1120737  44.2254297 155.2651475 
##          33          44          45          47          49          57 
##  48.2861295  79.5022095  71.4443260   6.8187861  38.6830801  53.4684427 
##          58          59          61          65          66          73 
##  36.8538877  37.8912832  38.9286787 148.1049988 220.0499925   3.4687337 
##          77          80          88          95         101         102 
##  31.8343349  86.5878482  20.9677033 221.2166612  -0.5907728  28.8392846 
##         105         107         114         115         125         126 
##   5.1101381  38.3929239  16.5748227  42.5581749  13.0859464   6.2504713 
##         129         130         132         134         136         138 
##  88.5737627  93.7607402  19.5158047  32.0200571  82.2244804  55.5993306 
##         139         140         142         148         150         152 
##  40.9865717  15.7062477  49.7626399 119.7077469  68.2092234 287.0938586 
##         154         156         157         158         160         161 
## 428.9807484 312.4511155 450.1528416 -16.5189132   0.9441343  14.0210357 
##         163         167         174         180         181         182 
##  32.0778422  83.6914232  10.6676732  50.7673031  79.9665164 -13.1857115 
##         183         184         193 
##  -5.2469803   7.6098997 362.0639734

# displaying prediction
plot(test_set$ERP,predi) #line chart of predition of actual vs prediction

#Calculating RSME 
val=sqrt(sum(predi-test_set$ERP)^2)/length(test_set$ERP)
val

## [1] 7.634874

#Adjusted R square value 
R2=summary(regressor)$r.squared
R2

## [1] 0.9653284

MULTIPLE LINEAR REGRESSION

#MULTIPLE LINEAR REGRESSION
#Fitting a model
regressor = lm(formula=ERP~PRP,data = training_set)
summary(regressor) # summary of the model

## 
## Call:
## lm(formula = ERP ~ PRP, data = training_set)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -230.116  -11.425   -0.413    9.000  179.279 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.85115    3.89424   1.503    0.135    
## PRP          0.88659    0.02093  42.357   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.9 on 144 degrees of freedom
## Multiple R-squared:  0.9257, Adjusted R-squared:  0.9252 
## F-statistic:  1794 on 1 and 144 DF,  p-value: < 2.2e-16

plot(regressor) # line chart of regressor

#Prediction
predi = predict(object = regressor, newdata=test_set)
predi

##          2          3         10         11         12         15         18 
##  244.34346  200.90062 1020.10842   39.54151   41.31469   14.71703   30.67563 
##         19         20         28         29         31         33         44 
##   33.33539  112.24177   29.78904   74.11846  248.77640   34.22198   59.04646 
##         45         47         49         57         58         59         61 
##   69.68552   21.80974   41.31469   68.79893   16.49021   18.26339   20.03656 
##         65         66         73         77         80         88         95 
##  133.51990  235.47758   25.35610   45.74763   80.32458   36.88175  215.97263 
##        101        102        105        107        114        115        125 
##   27.12927   45.74763   20.03656   34.22198   35.99516   50.18057   23.58292 
##        126        129        130        132        134        136        138 
##   29.78904   67.91234   76.77823   20.03656   34.22198   53.72693   32.44880 
##        139        140        142        148        150        152        154 
##   50.18057   41.31469   59.04646  104.26247   98.94294  251.43617  458.01130 
##        156        157        158        160        161        163        167 
##  294.87901  458.01130   12.94386   20.92315   24.46951   35.99516  108.69542 
##        174        180        181        182        183        184        193 
##   28.01586   59.04646  102.48930   11.17068   15.60362   25.35610  364.91950

# displaying prediction
plot(test_set$ERP,predi) #line chart of predition of actual vs prediction

#Calculating RSME 
val=sqrt(sum(predi-test_set$ERP)^2)/length(test_set$ERP)
val

## [1] 0.5449456

#Adjusted R square value 
R2=summary(regressor)$r.squared
R2

## [1] 0.9257015

Including Plots

#Visualizing initial data

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.1.3

Box plot

ggplot(data, aes(x=factor(CACH), y=PRP)) + geom_boxplot(aes(fill = CACH))+labs(title = "20MID0034")

Histogram

ggplot(data,aes(x=PRP)) + geom_histogram(bins=30) +facet_wrap(~MYCT)+labs(title = "20MID0034")

#Scatterplot using ggplot2

ggplot(data, aes(x=MYCT, y=MMIN, col=PRP))+geom_point()+labs(title = "20MID0034")

``Note that theecho = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

COMPUTER HARDWARE DATASET

GIRIDHAR20MID0034

2022-11-11