Roll No.: C91/STS/191016
MSc. Statistics Sem IV

We are provided with data on different features of cars of different companies.

Objective: To apply Partial Least Squares Regression on the data and interpret.

Loading data:

data1=read.csv("~./CarData.csv")
head(data1)

Dividing Dataset into Training and Test Data

Our total number of observations is 30.

We shall take 75% of our observations, i.e., 23 observations as the training data, and the rest 7 observations as the test data.

We shall choose the training set observations, by taking a random sample of 23 numbers between 1 and 30. And choose the remaining observations as the test data.

set.seed(12)
obsIndex=sample(1:30,23,replace=FALSE)
trainData=data1[obsIndex,]
testData=data1[-obsIndex,]

Partial Least Square Regression:

We choose our response variables as price and city.mpg.

Where, price= Price of the car in dollars. city.mpg= Fuel consumption in city.

We choose the rest of the variables, as our predictors.

library(chillR)
Y=cbind(trainData$price,trainData$city.mpg)
X=as.matrix(trainData[,c(-1,-14,-16)])
library(pls)
## 
## Attaching package: 'pls'
## The following object is masked from 'package:chillR':
## 
##     RMSEP
## The following object is masked from 'package:stats':
## 
##     loadings
pls1=plsr(Y~.,ncomp=5,validation="LOO",data=trainData[,c(-1,-14,-16)])
summary(pls1)
## Data:    X dimension: 23 13 
##  Y dimension: 23 2
## Fit method: kernelpls
## Number of components considered: 5
## 
## VALIDATION: RMSEP
## Cross-validated using 23 leave-one-out segments.
## 
## Response: Y1 
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps
## CV            8539     4104     3057     3217     4326     4690
## adjCV         8539     4090     3043     3190     4267     4629
## 
## Response: Y2 
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps
## CV           6.518    4.820    3.996    4.381    3.927    4.024
## adjCV        6.518    4.812    3.986    4.380    3.895    4.011
## 
## TRAINING: % variance explained
##     1 comps  2 comps  3 comps  4 comps  5 comps
## X     75.80    99.91    99.95   100.00   100.00
## Y1    82.60    91.26    92.91    93.24    93.88
## Y2    49.89    67.33    70.66    79.54    79.87

Prediction using our PLSR Model:

We test our model on the test data and get the following results:

preddat=predict(pls1,ncomp=5,newdata=testData)

results=data.frame(preddat,testData$price,testData$city.mpg)
results