Roll No.: C91/STS/191016
MSc. Statistics Sem IV
We are provided with data on different features of cars of different companies.
Objective: To apply Partial Least Squares Regression on the data and interpret.
Loading data:
data1=read.csv("~./CarData.csv")
head(data1)
Our total number of observations is 30.
We shall take 75% of our observations, i.e., 23 observations as the training data, and the rest 7 observations as the test data.
We shall choose the training set observations, by taking a random sample of 23 numbers between 1 and 30. And choose the remaining observations as the test data.
set.seed(12)
obsIndex=sample(1:30,23,replace=FALSE)
trainData=data1[obsIndex,]
testData=data1[-obsIndex,]
We choose our response variables as price and city.mpg.
Where, price= Price of the car in dollars. city.mpg= Fuel consumption in city.
We choose the rest of the variables, as our predictors.
library(chillR)
Y=cbind(trainData$price,trainData$city.mpg)
X=as.matrix(trainData[,c(-1,-14,-16)])
library(pls)
##
## Attaching package: 'pls'
## The following object is masked from 'package:chillR':
##
## RMSEP
## The following object is masked from 'package:stats':
##
## loadings
pls1=plsr(Y~.,ncomp=5,validation="LOO",data=trainData[,c(-1,-14,-16)])
summary(pls1)
## Data: X dimension: 23 13
## Y dimension: 23 2
## Fit method: kernelpls
## Number of components considered: 5
##
## VALIDATION: RMSEP
## Cross-validated using 23 leave-one-out segments.
##
## Response: Y1
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps
## CV 8539 4104 3057 3217 4326 4690
## adjCV 8539 4090 3043 3190 4267 4629
##
## Response: Y2
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps
## CV 6.518 4.820 3.996 4.381 3.927 4.024
## adjCV 6.518 4.812 3.986 4.380 3.895 4.011
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps
## X 75.80 99.91 99.95 100.00 100.00
## Y1 82.60 91.26 92.91 93.24 93.88
## Y2 49.89 67.33 70.66 79.54 79.87
We test our model on the test data and get the following results:
preddat=predict(pls1,ncomp=5,newdata=testData)
results=data.frame(preddat,testData$price,testData$city.mpg)
results