Question 2.1.1

Describe a situation or problem from your job, everyday life, current events, etc., for which a classification model would be appropriate. List some (up to 5) predictors that you might use.

Solution

Detailed case

Job:

Problem Description:

Imagine a situation, where a Data Scientist working for an automotive Service company was given an assignment to find weather the customers in their data base will churn out (no longer come back) or not. The data base was exhaustive and had enough data. It also had a information about the customers who terminated their contract with company. Apart from this, it also had information about the factors (Predictor variables) which seems affect the decision of the customer. The data scientist decided to use Machine Learning tools to model the situation.

The model chosen: Classification model

Predictor variables are as follow:-

  1. Customer review (say 1-10 points scale)
  2. Service time (problem resolution time in mins)
  3. Waiting time (difference between time when resolution starts and complaint filed)
  4. Service charges (Cost of the Service offered)
  5. Reward points (points/credits given to customer for signing-up for service at dealer)
  6. Discount offered (discount for the service in terms of %)
  7. Annual maintenance contract (Whether the customer signed up for next year contract or not)
  8. No. of referrals (given by a customer to his/her friends or family members) etc.,

Other additional scenarios

Everyday life:

Classification models can be well situated for finding whether a vegetable is fit to use or not based on the predictor variables like Color of vegetable, Smell of vegetable, Size of the vegetable, total no of day spent in refrigerator, Freshness of the vegetable (by appearance) etc.,

Current Events:

Classification models can be well situated for finding whether it will rain today or not based on the predictor variables like Humidity, Wind Speed, Yesterday’s weather condition, Air Temperature, Air pressure etc.,

Question 2.2.1

The files credit_card_data.txt (without headers) and credit_card_data-headers.txt (with headers) contain a dataset with 654 data points, 6 continuous and 4 binary predictor variables. It has anonymized credit card applications with a binary response variable (last column) indicating if the application was positive or negative. The dataset is the “Credit Approval Data Set” from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Credit+Approval) without the categorical variables and without data points that have missing values.

  1. Using the support vector machine function ksvm contained in the R package kernlab, find a good classifier for this data. Show the equation of your classifier, and how well it classifies the data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that topic soon.)

a) R Codes

## install.packages("kernlab") ##  to install the R package for SVM library

library(kernlab)## use this call the library for the SVM

data <- read.table("C:/Users/amirt/Desktop/New folder/credit_card_data.txt", stringsAsFactors = FALSE, header = FALSE) # use this to input the data into variable 

head(data,10)# to explore top 10 observation to know about vairable datatypes
##    V1    V2     V3    V4 V5 V6 V7 V8  V9   V10 V11
## 1   1 30.83  0.000 1.250  1  0  1  1 202     0   1
## 2   0 58.67  4.460 3.040  1  0  6  1  43   560   1
## 3   0 24.50  0.500 1.500  1  1  0  1 280   824   1
## 4   1 27.83  1.540 3.750  1  0  5  0 100     3   1
## 5   1 20.17  5.625 1.710  1  1  0  1 120     0   1
## 6   1 32.08  4.000 2.500  1  1  0  0 360     0   1
## 7   1 33.17  1.040 6.500  1  1  0  0 164 31285   1
## 8   0 22.92 11.585 0.040  1  1  0  1  80  1349   1
## 9   1 54.42  0.500 3.960  1  1  0  1 180   314   1
## 10  1 42.50  4.915 3.165  1  1  0  0  52  1442   1
## 1) MODEL: support vector machine model with scaling

model <- ksvm(V11~.,data=data,type = "C-svc",kernel = "vanilladot",C = 100,scaled=TRUE) # model creation
##  Setting default kernel parameters
model # display model characteristic
## Support Vector Machine object of class "ksvm" 
## 
## SV type: C-svc  (classification) 
##  parameter : cost C = 100 
## 
## Linear (vanilla) kernel function. 
## 
## Number of Support Vectors : 189 
## 
## Objective Function Value : -17887.92 
## Training error : 0.136086
a<-colSums(model@xmatrix[[1]] * model@coef[[1]]) # to find the value of a's

# a0 is just -model @b 

a0<- -model@b # to find the value of a0

# To display the values of a's and a0

a 
##            V1            V2            V3            V4            V5 
## -0.0010065348 -0.0011729048 -0.0016261967  0.0030064203  1.0049405641 
##            V6            V7            V8            V9           V10 
## -0.0028259432  0.0002600295 -0.0005349551 -0.0012283758  0.1063633995
a0
## [1] 0.08158492
# see what the model predicts
pred <- predict(model,data[,1:10])
pred
##   [1] 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
##  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [260] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## [297] 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## [556] 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# see what fraction of the model’s predictions match the actual classification
sum(pred == data[,11]) / nrow(data)
## [1] 0.8639144
## 2) MODEL: support vector machine model without  scaling

model <- ksvm(V11~.,data=data,type = "C-svc",kernel = "vanilladot",C = 100,scaled=FALSE) # model creation
##  Setting default kernel parameters
model # display model characteristic
## Support Vector Machine object of class "ksvm" 
## 
## SV type: C-svc  (classification) 
##  parameter : cost C = 100 
## 
## Linear (vanilla) kernel function. 
## 
## Number of Support Vectors : 186 
## 
## Objective Function Value : -2213.731 
## Training error : 0.278287
a<-colSums(model@xmatrix[[1]] * model@coef[[1]]) # to find the value of a's

# a0 is just -model @b 

a0<- -model@b # to find the value of a0

# To display the values of a's and a0

a 
##            V1            V2            V3            V4            V5 
## -0.0483050561 -0.0083148473 -0.0836550114  0.1751121271  1.8254844547 
##            V6            V7            V8            V9           V10 
##  0.2763673361  0.0654782414 -0.1108211169 -0.0047229653 -0.0007764962
a0
## [1] 0.5255393
# see what the model predicts
pred <- predict(model,data[,1:10])
pred
##   [1] 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0
##  [38] 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 0 1 0 0 0 0 0 1 0
##  [75] 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 0 1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1
## [186] 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1
## [223] 1 0 1 1 0 0 0 1 0 1 0 1 0 1 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1
## [260] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0
## [297] 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
## [334] 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
## [371] 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0
## [408] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0
## [445] 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 1 0 1
## [482] 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1
## [519] 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1
## [556] 1 0 1 1 1 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0
## [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0
## [630] 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1
# see what fraction of the model’s predictions match the actual classification
sum(pred == data[,11]) / nrow(data)
## [1] 0.7217125

b) Solutions:

Hyperplane of the separting the class is given the coefficient as

-0.0010065348z1 - 0.0011729048z2 - 0.0016261967z3 + 0.0030064203z4 + 1.0049405641z5 - 0.0028259432z6 + 0.0002600295z7 - 0.0005349551z8 - 0.0012283758z9 + 0.1063633995z10 + 0.08158492 = 0

The Trainning performance of the SVM is evluated:

Total number of support vectors = 189

Training error = 0.136086

The accuracy of the model = 0.8639144

However, the for any unscaled model the accuracy of the model was found to be low as 0.7217125.

Here, Scaling improves the performance the This SVM model

Question 2.2.2

  1. You are welcome, but not required, to try other (nonlinear) kernels as well; we’re not covering them in this course, but they can sometimes be useful and might provide better predictions than vanilladot

a) R Codes

## MODEL: support vector machine model with non-linear radial kernal

model <- ksvm(V11~.,data=data,type = "C-svc",kernel = "rbfdot",C =100, scaled=TRUE) # model creation

model
## Support Vector Machine object of class "ksvm" 
## 
## SV type: C-svc  (classification) 
##  parameter : cost C = 100 
## 
## Gaussian Radial Basis kernel function. 
##  Hyperparameter : sigma =  0.097273682447156 
## 
## Number of Support Vectors : 242 
## 
## Objective Function Value : -8796.874 
## Training error : 0.045872

b) Solutions:

When we can easily separate data with hyperplane by drawing a straight line is Linear SVM. When we cannot separate data with a straight line we use Non – Linear SVM. In this, we have Kernel functions. They transform non-linear spaces into linear spaces. It transforms data into another dimension so that the data can be classified.

b) Question 2.3.3

  1. Using the k-nearest-neighbors classification function kknn contained in the R kknn package, suggest a good value of k, and show how well it classifies that data points in the full data set. Don’t forget to scale the data (scale=TRUE in kknn).

a) R Codes

#install.packages("kknn")

library(kknn) ## to call the knn function in R

#inputed the data

data <- read.table("C:/Users/amirt/Desktop/New folder/credit_card_data.txt", stringsAsFactors = FALSE, header = FALSE) 

# to see top 10 observation

head(data,10)
##    V1    V2     V3    V4 V5 V6 V7 V8  V9   V10 V11
## 1   1 30.83  0.000 1.250  1  0  1  1 202     0   1
## 2   0 58.67  4.460 3.040  1  0  6  1  43   560   1
## 3   0 24.50  0.500 1.500  1  1  0  1 280   824   1
## 4   1 27.83  1.540 3.750  1  0  5  0 100     3   1
## 5   1 20.17  5.625 1.710  1  1  0  1 120     0   1
## 6   1 32.08  4.000 2.500  1  1  0  0 360     0   1
## 7   1 33.17  1.040 6.500  1  1  0  0 164 31285   1
## 8   0 22.92 11.585 0.040  1  1  0  1  80  1349   1
## 9   1 54.42  0.500 3.960  1  1  0  1 180   314   1
## 10  1 42.50  4.915 3.165  1  1  0  0  52  1442   1
check_accuracy = function(X){
  
    predicted <- rep(0,(nrow(data)))
    
    for (i in 1:nrow(data)){
      
    # model creation using scaled data
    model=kknn(V11~V1+V2+V3+V4+V5+V6+V7+V8+V9+V10,data[-i,],data[i,],k=X, scale = TRUE) # use scaled data
    
    # this is to round-off values
    
    predicted[i] <- as.integer(fitted(model)+0.5)
  
    }

    # calculation of accuracy

  accuracy = sum(predicted == data[,11]) / nrow(data)
  return(accuracy)
}

acc <- rep(0,20) # set up a vector of 20 zeros to start
for (X in 1:20){
  acc[X] = check_accuracy(X) # test knn with X neighbors
}

#
# report accuracies
#

acc
##  [1] 0.8149847 0.8149847 0.8149847 0.8149847 0.8516820 0.8455657 0.8470948
##  [8] 0.8486239 0.8470948 0.8501529 0.8516820 0.8532110 0.8516820 0.8516820
## [15] 0.8532110 0.8516820 0.8516820 0.8516820 0.8501529 0.8501529
## 

check_accuracy_unscaled = function(X){
  
    predicted <- rep(0,(nrow(data)))
    
    for (i in 1:nrow(data)){
      
    # model creation using scaled data
    model=kknn(V11~V1+V2+V3+V4+V5+V6+V7+V8+V9+V10,data[-i,],data[i,],k=X, scale = FALSE) # use unscaled data
    
    # this is to round-off values
    
    predicted[i] <- as.integer(fitted(model)+0.5)
  
    }

    # calculation of accuracy

  accuracy = sum(predicted == data[,11]) / nrow(data)
  return(accuracy)
}

acc_unscaled <- rep(0,20) # set up a vector of 20 zeros to start
for (X in 1:20){
  acc_unscaled[X] = check_accuracy_unscaled(X) # test knn with X neighbors
}

#
# report accuracies
#

acc_unscaled
##  [1] 0.6636086 0.6636086 0.6636086 0.6636086 0.6911315 0.6957187 0.6926606
##  [8] 0.6926606 0.6865443 0.6773700 0.6804281 0.6834862 0.6865443 0.6880734
## [15] 0.6880734 0.6926606 0.6926606 0.6926606 0.6926606 0.6911315

b) Solutions:

Case 1: Scaled

Values of Accuracy for different values of K from 1 to 20 are as follow:- \

[K]
[1] 0.8149847
[2] 0.8149847
[3] 0.8149847
[4] 0.8149847
[5] 0.8516820
[6] 0.8455657
[7] 0.8470948
[8] 0.8486239
[9] 0.8470948
[10] 0.8501529
[11] 0.8516820
[12] 0.8532110 ** Best accuracy value at k=12 (~558 correct predictions)
[13] 0.8516820
[14] 0.8516820
[15] 0.8532110 ** Best accuracy value at k=15 (~558 correct predictions)
[16] 0.8516820
[17] 0.8516820
[18] 0.8516820
[19] 0.8501529
[20] 0.8501529

Case 2: unScaled

Values of Accuracy for different values of K from 1 to 20 are as follow:-

[K]
[1] 0.6636
[2] 0.6636
[3] 0.6636
[4] 0.6636
[5] 0.6911
[6] 0.6957 ** Best accuracy value at k=6 (~455 correct predictions)
[7] 0.6927
[8] 0.6927
[9] 0.6965
[10] 0.6774
[11] 0.6804
[12] 0.6835
[13] 0.6927
[14] 0.6881
[15] 0.6881
[16] 0.6927
[17] 0.6927
[18] 0.6927
[19] 0.6927
[20] 0.6911