Loading neccessary packages and dataset. Dataset is GermanCredit. It has Credit Worthiness feature with two classes “good” and “bad”. This will be our response variable.

library(caret)
## Warning: package 'caret' was built under R version 3.3.2
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.2
library(e1071)
## Warning: package 'e1071' was built under R version 3.3.2
data(GermanCredit)
dataset = GermanCredit

Exploring the data by checking the structure of the dataset, like datatypes of the variables. Then we are scaling the top 7 columns of the GermanCredit Dataset. The scale function will standardized the values of those 7 columns, this is basically called Data Transformation process. The new standardized values are finally updated in the dataset. We can see from the plots that the values are scaled.

str(dataset)
## 'data.frame':    1000 obs. of  62 variables:
##  $ Duration                              : int  6 48 12 42 24 36 24 36 12 30 ...
##  $ Amount                                : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
##  $ InstallmentRatePercentage             : int  4 2 2 2 3 2 3 2 2 4 ...
##  $ ResidenceDuration                     : int  4 2 3 4 4 4 4 2 4 2 ...
##  $ Age                                   : int  67 22 49 45 53 35 53 35 61 28 ...
##  $ NumberExistingCredits                 : int  2 1 1 1 2 1 1 1 1 2 ...
##  $ NumberPeopleMaintenance               : int  1 1 2 2 2 2 1 1 1 1 ...
##  $ Telephone                             : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                                 : Factor w/ 2 levels "Bad","Good": 2 1 2 2 1 2 2 2 2 1 ...
##  $ CheckingAccountStatus.lt.0            : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200        : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CheckingAccountStatus.none            : num  0 0 1 0 0 1 1 0 1 0 ...
##  $ CreditHistory.NoCredit.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly                : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay                   : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ CreditHistory.Critical                : num  1 0 1 0 0 0 0 0 0 1 ...
##  $ Purpose.NewCar                        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                       : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment           : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television              : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                     : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Vacation                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Retraining                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Other                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100            : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000       : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000           : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ SavingsAccountBonds.Unknown           : num  1 0 0 0 0 1 0 0 0 0 ...
##  $ EmploymentDuration.lt.1               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4             : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7             : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7               : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ EmploymentDuration.Unemployed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Male.Divorced.Seperated      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle             : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single                  : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ Personal.Male.Married.Widowed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Female.Single                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.None           : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.Guarantor      : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...
##  $ OtherInstallmentPlans.Bank            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.None            : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Housing.Rent                          : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                           : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Housing.ForFree                       : num  0 0 0 1 1 1 0 0 0 0 ...
##  $ Job.UnemployedUnskilled               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident                 : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee                   : num  1 1 0 1 1 0 1 0 0 0 ...
##  $ Job.Management.SelfEmp.HighlyQualified: num  0 0 0 0 0 0 0 1 0 1 ...
dataset[,1:7] = as.data.frame(lapply(dataset[,1:7], scale))
plot(dataset[,1:7])

str(dataset)
## 'data.frame':    1000 obs. of  62 variables:
##  $ Duration                              : num  -1.236 2.247 -0.738 1.75 0.257 ...
##  $ Amount                                : num  -0.745 0.949 -0.416 1.633 0.566 ...
##  $ InstallmentRatePercentage             : num  0.918 -0.8697 -0.8697 -0.8697 0.0241 ...
##  $ ResidenceDuration                     : num  1.046 -0.766 0.14 1.046 1.046 ...
##  $ Age                                   : num  2.765 -1.191 1.183 0.831 1.534 ...
##  $ NumberExistingCredits                 : num  1.027 -0.705 -0.705 -0.705 1.027 ...
##  $ NumberPeopleMaintenance               : num  -0.428 -0.428 2.334 2.334 2.334 ...
##  $ Telephone                             : num  0 1 1 1 1 0 1 0 1 1 ...
##  $ ForeignWorker                         : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Class                                 : Factor w/ 2 levels "Bad","Good": 2 1 2 2 1 2 2 2 2 1 ...
##  $ CheckingAccountStatus.lt.0            : num  1 0 0 1 1 0 0 0 0 0 ...
##  $ CheckingAccountStatus.0.to.200        : num  0 1 0 0 0 0 0 1 0 1 ...
##  $ CheckingAccountStatus.gt.200          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CheckingAccountStatus.none            : num  0 0 1 0 0 1 1 0 1 0 ...
##  $ CreditHistory.NoCredit.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.ThisBank.AllPaid        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CreditHistory.PaidDuly                : num  0 1 0 1 0 1 1 1 1 0 ...
##  $ CreditHistory.Delay                   : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ CreditHistory.Critical                : num  1 0 1 0 0 0 0 0 0 1 ...
##  $ Purpose.NewCar                        : num  0 0 0 0 1 0 0 0 0 1 ...
##  $ Purpose.UsedCar                       : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Purpose.Furniture.Equipment           : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Purpose.Radio.Television              : num  1 1 0 0 0 0 0 0 1 0 ...
##  $ Purpose.DomesticAppliance             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Repairs                       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Education                     : num  0 0 1 0 0 1 0 0 0 0 ...
##  $ Purpose.Vacation                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Retraining                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Business                      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Purpose.Other                         : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.lt.100            : num  0 1 1 1 1 0 0 1 0 1 ...
##  $ SavingsAccountBonds.100.to.500        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SavingsAccountBonds.500.to.1000       : num  0 0 0 0 0 0 1 0 0 0 ...
##  $ SavingsAccountBonds.gt.1000           : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ SavingsAccountBonds.Unknown           : num  1 0 0 0 0 1 0 0 0 0 ...
##  $ EmploymentDuration.lt.1               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EmploymentDuration.1.to.4             : num  0 1 0 0 1 1 0 1 0 0 ...
##  $ EmploymentDuration.4.to.7             : num  0 0 1 1 0 0 0 0 1 0 ...
##  $ EmploymentDuration.gt.7               : num  1 0 0 0 0 0 1 0 0 0 ...
##  $ EmploymentDuration.Unemployed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Male.Divorced.Seperated      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ Personal.Female.NotSingle             : num  0 1 0 0 0 0 0 0 0 0 ...
##  $ Personal.Male.Single                  : num  1 0 1 1 1 1 1 1 0 0 ...
##  $ Personal.Male.Married.Widowed         : num  0 0 0 0 0 0 0 0 0 1 ...
##  $ Personal.Female.Single                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.None           : num  1 1 1 0 1 1 1 1 1 1 ...
##  $ OtherDebtorsGuarantors.CoApplicant    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherDebtorsGuarantors.Guarantor      : num  0 0 0 1 0 0 0 0 0 0 ...
##  $ Property.RealEstate                   : num  1 1 1 0 0 0 0 0 1 0 ...
##  $ Property.Insurance                    : num  0 0 0 1 0 0 1 0 0 0 ...
##  $ Property.CarOther                     : num  0 0 0 0 0 0 0 1 0 1 ...
##  $ Property.Unknown                      : num  0 0 0 0 1 1 0 0 0 0 ...
##  $ OtherInstallmentPlans.Bank            : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.Stores          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ OtherInstallmentPlans.None            : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Housing.Rent                          : num  0 0 0 0 0 0 0 1 0 0 ...
##  $ Housing.Own                           : num  1 1 1 0 0 0 1 0 1 1 ...
##  $ Housing.ForFree                       : num  0 0 0 1 1 1 0 0 0 0 ...
##  $ Job.UnemployedUnskilled               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Job.UnskilledResident                 : num  0 0 1 0 0 1 0 0 1 0 ...
##  $ Job.SkilledEmployee                   : num  1 1 0 1 1 0 1 0 0 0 ...
##  $ Job.Management.SelfEmp.HighlyQualified: num  0 0 0 0 0 0 0 1 0 1 ...

After data transform in previous section, we will now take a random sample of 200 values out of 1000, because dataset has 1000 rows in total. Here we are taking sample of 200 indexes which we use to extract values from dataset. We are subsetting Training and Test dataset from our original dataset, 200 values for Test dataset and 800 values for Training dataset. 80% values - Training Dataset 20% values - Test Dataset

dim(dataset)
## [1] 1000   62
set.seed(1234)
sample_index = sample(1000, 200)
test_dataset = dataset[sample_index,]
train_dataset = dataset[-sample_index,]

Tuning the svm models. We will get optimal cost and gamma parameter which we will use in next section for creating svm model. We are using Linear, Polynomial, Sigmoid and Radial kernels to create svm model, later we will check the accuracy of all the kernels whichever is best in terms of accuracy we will pick it.

svm_tune_radial <- tune(svm, Class ~ .,
                        data = train_dataset,
                        kernel="radial",
                        ranges=list(cost=10^(-1:2),
                                    gamma=c(.5,1,2),
                                    scale=F
                        ))

sumry_radial <- summary(svm_tune_radial)
print(sumry_radial$best.parameters)
##   cost gamma scale
## 3   10   0.5 FALSE
svm_tune_poly <- tune(svm, Class ~ .,
                      data = train_dataset,
                      kernel="polynomial",
                      ranges=list(cost=10^(-1:2),
                                  gamma=c(.5,1,2),
                                  scale=F
                      ))

sumry_poly <- summary(svm_tune_poly)
print(sumry_poly$best.parameters)
##   cost gamma scale
## 1  0.1   0.5 FALSE
svm_tune_sigm <- tune(svm, Class ~ .,
                      data = train_dataset,
                      kernel="sigmoid",
                      ranges=list(cost=10^(-1:2),
                                  gamma=c(.5,1,2),
                                  scale=F
                      ))

sumry_sigm <- summary(svm_tune_sigm)
print(sumry_sigm$best.parameters)
##   cost gamma scale
## 5  0.1     1 FALSE
svm_tune_linear <- tune(svm, Class ~ .,
                        data = train_dataset,
                        kernel="linear",
                        ranges=list(cost=10^(-1:2),
                                    scale=F                        
                        ))

sumry_linear <- summary(svm_tune_linear)
print(sumry_linear$best.parameters)
##   cost scale
## 2    1 FALSE

Now we have Training and Test dataset we will fit the SVM model. The Class is our predictor variable which is here a categorical variable, which represents “good” or “bad” Credit Worthiness of a person. So we are basically separating/classifying “good” and “bad” values of the Credit Worthiness feature (Class). Cost is set to 0.1 intitially which is general penalizing parameter, it’s a cost of penalizaling for misclassification. So if C is large the bias will be low and variance high. Gamma is parameter of Guassian kernel, used for nonlinear structures. In our same, for linear optimal cost is 1, for radial it is 10 and gamma 0.5. We will use these values for build our svm model.

svm_fit_radial <- svm(Class ~ ., kernel="radial", cost = 10, gamma=0.5,data = train_dataset)
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'Purpose.Vacation' and 'Personal.Female.Single' constant.
## Cannot scale data.
print(svm_fit_radial)
## 
## Call:
## svm(formula = Class ~ ., data = train_dataset, kernel = "radial", 
##     cost = 10, gamma = 0.5)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  10 
##       gamma:  0.5 
## 
## Number of Support Vectors:  800
svm_fit_poly <- svm(Class ~ ., kernel="polynomial", cost = 0.1, gamma=0.5,data = train_dataset)
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'Purpose.Vacation' and 'Personal.Female.Single' constant.
## Cannot scale data.
print(svm_fit_poly)
## 
## Call:
## svm(formula = Class ~ ., data = train_dataset, kernel = "polynomial", 
##     cost = 0.1, gamma = 0.5)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.1 
##      degree:  3 
##       gamma:  0.5 
##      coef.0:  0 
## 
## Number of Support Vectors:  494
svm_fit_sigm <- svm(Class ~ ., kernel="sigmoid", cost = 0.1, gamma=1,data = train_dataset)
## Warning in svm.default(x, y, scale = scale, ..., na.action = na.action):
## Variable(s) 'Purpose.Vacation' and 'Personal.Female.Single' constant.
## Cannot scale data.
print(svm_fit_sigm)
## 
## Call:
## svm(formula = Class ~ ., data = train_dataset, kernel = "sigmoid", 
##     cost = 0.1, gamma = 1)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  sigmoid 
##        cost:  0.1 
##       gamma:  1 
##      coef.0:  0 
## 
## Number of Support Vectors:  434
svm_fit_linear <- svm(Class ~ ., kernel="linear", cost = 1, data = train_dataset, scale = F)
print(svm_fit_linear)
## 
## Call:
## svm(formula = Class ~ ., data = train_dataset, kernel = "linear", 
##     cost = 1, scale = F)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
##       gamma:  0.01639344 
## 
## Number of Support Vectors:  421

Now we will predict the Credit Worthiness (Class) feature for all the svm models with diffrent kernels and later in next section we will check for the accuracy of the prediction.

predictions <-  predict(svm_fit_linear, test_dataset[-10])
table(test_dataset[,10], predictions)
##       predictions
##        Bad Good
##   Bad   30   33
##   Good  14  123
predictions <-  predict(svm_fit_radial, test_dataset[-10])
table(test_dataset[,10], predictions)
##       predictions
##        Bad Good
##   Bad    1   62
##   Good   0  137
predictions <-  predict(svm_fit_poly, test_dataset[-10])
table(test_dataset[,10], predictions)
##       predictions
##        Bad Good
##   Bad   29   34
##   Good  19  118
predictions <-  predict(svm_fit_sigm, test_dataset[-10])
table(test_dataset[,10], predictions)
##       predictions
##        Bad Good
##   Bad   11   52
##   Good  25  112

Calculating Accuracy from the confusion matrix table for all kernels. We conclude from the accuracy values that Linear kernel with 76.5% accuracy is best for the given sample.

Linear:

    print((123+30)/(123+30+33+14)*100)
## [1] 76.5

Radial:

    print((137+1)/(137+62+1)*100)
## [1] 69

Poly:

    print((118+29)/(118+34+19+29)*100)
## [1] 73.5

Sigmoid:

    print((112+11)/(112+11+25+52)*100)
## [1] 61.5