Using Data
We will do followings during our Machine learning process:
0. Prepare Environment
1. Import data
2. Data cleansing
3. Train Model
4. Predict
5. Analyse performance
0) Prepare Envirnoment
To do the analysis , we need to import related packages.
rm(list = ls(all = TRUE)) # Removing any exisiting variables
require(dplyr)
require(tidyr)
require(caret)
require(nnet)
require(pROC)
require(e1071)
require(rpart)
## For reproducible results we need to start with the same seed
set.seed(1333)
if(.Platform$OS.type =="windows")
{
setwd("D:/gitRepos/PerfAnalysis/")
}else {
setwd("~/gitRepos/PerfAnalysis/")
##library(doMC)
##registerDoMC(cores = 4)
}
source('myMultiClassSummary.R')
1) Import data
We first read the csv files: ML-MUTAG.csv and ML-GV-MUTAG.csv . Here GV is short for GraphVector and GS is short for GraphletSearch
df.input.GS<-read.csv2(file="./input/ML-MUTAG.csv",
sep=";",
header = TRUE,
dec = ".")
df.input.GV<-read.csv2(file="./input/ML-GV-MUTAG.csv",
sep=";",
header = TRUE,
dec = ".")
Lets see the dimension of each data frame:
Rows Columns
GS 188 47
GV 188 21
2) Data cleansing
- First we need to correct column data types in order our classification algorithms to work. So the GraphID and Class are of the type nominal (or in R called factor).
- Then we find the column with zero variance and remove it from dataset.
df.input.GV$GraphID <-as.factor(df.input.GV$GraphID );
df.input.GV$Class<-as.factor(make.names(df.input.GV$Class));
df.input.GS$GraphID <-as.factor(df.input.GS$GraphID );
df.input.GS$Class<-as.factor(make.names(df.input.GS$Class));
## Finding and removing columns with zero variance
nzv.GS<-nearZeroVar(df.input.GS)
nzv.GV<-nearZeroVar(df.input.GV)
Following columns/features have zero variance, which can be removed:
Rows Columns
GS 188 47
GV 188 21
Columns for GS to be removed: Components; d0; d1; d2; d3; d5; d6; d7; d8; d9; d10; d13; d14; d15; d16; d17; d20; d21; d22; d23; d25; d26; d27; d28; d29; d30; d31; d32; d33; d34; d35; d36; d37; d38; X
Columns for GV to be removed: Components; t.0; t.11; t.12; t.13; t.16; X
Now let see the shape of our datasets after removing zero variance
Rows Columns
GS 188 12
GV 188 14
Columns for GS: GraphID; Class; Verticies; Edges; AvgDegree; Density; d4; d11; d12; d18; d19; d24
Columns for GV: GraphID; Class; t.1; t.2; t.3; t.4; t.5; t.6; t.7; t.8; t.9; t.10; t.14; t.15
The dimensions corresponds to the following Graph type:
- Partitioning the datasets into train (70 % ) and test (30 %) for cross validation
##randomly partition data into two datasets ,training and testing
#GraphletSearch
inTrain.GS<-createDataPartition(y=df.input.GS$Class,p = .7,list=FALSE)
train.GS<-df.input.GS[inTrain.GS,]
test.GS<- df.input.GS[-inTrain.GS,]
#GraphVector
inTrain.GV<-createDataPartition(y=df.input.GV$Class,p = .7,list=FALSE)
train.GV<-df.input.GV[inTrain.GV,]
test.GV<- df.input.GV[-inTrain.GV,]
3) Train Model
Now we train following models:
* SVM
* SVM with Polynomial kernel
* SVM with Radial kernel
* Random forest
##Create a data frame from all combinations of the supplied parameters for Radial and linear Kernels
grid <- expand.grid(sigma = c(.01, .015, 0.2),
C = c(0.75, 0.9, 1, 1.1, 1.25))
##Create a data frame from all combinations of the supplied parameters for Polynomial kernel
grid.poly <- expand.grid(C = c(0.75, 0.9, 1, 1.1, 1.25),
scale=c(.0001),
degree=1:2)
#We need to check if it is binary classification
#or multiclass classifcation, which requires diffrent summary function.
if(length(unique(df.input.GS$Class)) > 2)
{
ctrl <-trainControl(method="repeatedcv",
repeats=10,
classProbs=TRUE,
summaryFunction=myMultiClassSummary)
} else
{
ctrl <- trainControl(method="repeatedcv",
repeats=10,
classProbs=TRUE,
summaryFunction=twoClassSummary)
}
## Parameters for Random Forest algorithm.
RF.control <- trainControl(method="repeatedcv", number=10, repeats=3)
Before training we will center and scale the features for better accuracy.
SVM linear
svmModel.linear.GS <- train(x=train.GS[,-c(1,2)],
y= train.GS[,2],
method = "svmLinear",
preProc = c("center","scale"),
trControl=ctrl)
svmModel.linear.GV <- train(x=train.GV[,-c(1,2)],
y= train.GV[,2],
method = "svmLinear",
preProc = c("center","scale"),
trControl=ctrl)
SVM with polynomial kernel
svmModel.Poly.GS <- train(x=train.GS[,-c(1,2)],
y= train.GS[,2],
method = "svmPoly",
preProc = c("center","scale"),
tuneGrid = grid.poly,trControl=ctrl)
svmModel.Poly.GV <- train(x=train.GV[,-c(1,2)],
y= train.GV[,2],
method = "svmPoly",
preProc = c("center","scale"),
tuneGrid = grid.poly,
trControl=ctrl)
SVM with radial kernel
svmModel.Radial.GS <- train(x=train.GS[,-c(1,2)],
y= train.GS[,2],
method = "svmRadial",
preProc = c("center","scale"),
tuneGrid = grid,
trControl=ctrl)
svmModel.Radial.GV <- train(x=train.GV[,-c(1,2)],
y= train.GV[,2],
method = "svmRadial",
preProc = c("center","scale"),
tuneGrid = grid,
trControl=ctrl)
Random forest
#Random forest has its own naming preference
#GraphletSearch dataset
RfTrainGS<-cbind(y=train.GS[,2],train.GS[,-c(1,2)])
RfModelGS <- train(y~.,data=RfTrainGS,method = "rf",
preProcess=c("center","scale"),
trControl=RF.control,
prox=TRUE,
tuneGrid=expand.grid(mtry = 5),
number=10,
ntree=500)
#GraphVector dataset
RfTrainGV<-cbind(y=train.GV[,2],train.GV[,-c(1,2)])
RfModelGV <- train(y~.,data=RfTrainGV,method = "rf",
preProcess=c("center","scale"),
trControl=RF.control,
prox=TRUE,
tuneGrid=expand.grid(mtry = 5),
number=10,
ntree=500)
4) Predict
We will predict the Class value for each Graph on test dataset so we can later compare to the actual Class with the predicted value and finally calculate performance.
## Predicting for different models for GraphletSearch test dataset
## -c(1,2) means removing GraphID,Class
prediction.y.linear.GS<-predict(svmModel.linear.GS,test.GS[,-c(1,2)])
prediction.y.Radial.GS<-predict(svmModel.Radial.GS,test.GS[,-c(1,2)])
prediction.y.Poly.GS <-predict(svmModel.Poly.GS,test.GS[,-c(1,2)])
prediction.y.RF.GS<-predict(RfModelGS,test.GS[,-c(1,2)])
## Predicting for different models for GraphVector test dataset
prediction.y.linear.GV<-predict(svmModel.linear.GV,test.GV[,-c(1,2)])
prediction.y.Radial.GV<-predict(svmModel.Radial.GV,test.GV[,-c(1,2)])
prediction.y.Poly.GV <-predict(svmModel.Poly.GV,test.GV[,-c(1,2)])
prediction.y.RF.GV<-predict(RfModelGV,test.GV[,-c(1,2)])
5) Analyse performance
First we calculate performance for each model, then we will put them in a table for each dataset:
confMatrix.linear.GS<-confusionMatrix(test.GS[,2],prediction.y.linear.GS)
confMatrix.linear.GV<-confusionMatrix(test.GV[,2],prediction.y.linear.GV)
confMatrix.radial.GS<-confusionMatrix(test.GS[,2],prediction.y.Radial.GS)
confMatrix.radial.GV<-confusionMatrix(test.GV[,2],prediction.y.Radial.GV)
confMatrix.poly.GS<-confusionMatrix(test.GS[,2],prediction.y.Poly.GS)
confMatrix.poly.GV<-confusionMatrix(test.GV[,2],prediction.y.Poly.GV)
confMatrix.rf.GS<-confusionMatrix(test.GS[,2],prediction.y.RF.GS)
confMatrix.rf.GV<-confusionMatrix(test.GV[,2],prediction.y.RF.GV)
acc.GS<-cbind(
Statistic='Accuracy',
SVM_Linear_GS=confMatrix.linear.GS$overall['Accuracy'],
SVM_Radial_GS=confMatrix.radial.GS$overall['Accuracy'],
SVM_Poly_GS=confMatrix.poly.GS$overall['Accuracy'],
RandForest_GS=confMatrix.rf.GS$overall['Accuracy'])
kappa.GS<-cbind(
Statistic='Kappa',
SVM_Linear_GS=confMatrix.linear.GS$overall['Kappa'],
SVM_Radial_GS=confMatrix.radial.GS$overall['Kappa'],
SVM_Poly_GS=confMatrix.poly.GS$overall['Kappa'],
RandForest_GS=confMatrix.rf.GS$overall['Kappa'])
acc.GV<-cbind(
Statistic='Accuracy',
SVM_Linear_GV=confMatrix.linear.GV$overall['Accuracy'],
SVM_Radial_GV=confMatrix.radial.GV$overall['Accuracy'],
SVM_Poly_GV=confMatrix.poly.GV$overall['Accuracy'],
RandForest_GV=confMatrix.rf.GV$overall['Accuracy'])
kappa.GV<-cbind(
Statistic='Kappa',
SVM_Linear_GV=confMatrix.linear.GV$overall['Kappa'],
SVM_Radial_GV=confMatrix.radial.GV$overall['Kappa'],
SVM_Poly_GV=confMatrix.poly.GV$overall['Kappa'],
RandForest_GV=confMatrix.rf.GV$overall['Kappa'])
comparison.GS<-as.data.frame(rbind(acc.GS,kappa.GS))
comparison.GV<-as.data.frame(rbind(acc.GV,kappa.GV))
Confusion matrix provides different information about performance of our model. As an Example for GraphletSearch dataset using Random forest:
Note: X1 means Class label “1” and X.1 refers to the Class label “-1” of the graph.
Confusion Matrix and Statistics
Reference
Prediction X1 X.1
X1 30 7
X.1 3 15
Accuracy : 0.8182
95% CI : (0.691, 0.9092)
No Information Rate : 0.6
P-Value [Acc > NIR] : 0.000462
Kappa : 0.6094
Mcnemar's Test P-Value : 0.342782
Sensitivity : 0.9091
Specificity : 0.6818
Pos Pred Value : 0.8108
Neg Pred Value : 0.8333
Prevalence : 0.6000
Detection Rate : 0.5455
Detection Prevalence : 0.6727
Balanced Accuracy : 0.7955
'Positive' Class : X1
Generally we can compare models on each dataset based on their accuracy and kappa value. For the same range of Accuracy, we would say the method with higher kappa value provides more reliability on the result.
Performance of GraphletSearch and GraphVector datasets shown below:
For GraphletSearch:
SVM_Linear_GS SVM_Radial_GS SVM_Poly_GS RandForest_GS
Accuracy "0.854545454545454" "0.872727272727273" "0.872727272727273" "0.818181818181818"
Kappa "0.66966966966967" "0.715025906735751" "0.715025906735751" "0.609375"
For GraphVector:
SVM_Linear_GV SVM_Radial_GV SVM_Poly_GV RandForest_GV
Accuracy "0.890909090909091" "0.836363636363636" "0.836363636363636" "0.890909090909091"
Kappa "0.759124087591241" "0.611764705882353" "0.63360473723168" "0.752252252252252"


---
title: 'Classifying Graph datasets using Machine learning algorithms : SVM and Random
  Forest'
author: "Reza Nirumand"
output:
  html_notebook: default
  pdf_document: default
---

# Creating Data
The dataset for this machine learning task is created using below Java Program. The program creates two datasets:  

* Using all k-Graphlets  with k=3,4,5 with the name convention ML-<fileName>.csv . For example ML-MUTAG.csv  
   Column names: GraphID; Class; Components;AvgDegree; Density; Vertices; Edges; t-0; t-1......t-37; t-38;  

* Using SimpleGraphVector with the name convention ML-GV-<fileName>.csv. For Example ML-GV-MUTAG.csv.  
   Column names: GraphID; Class; Components ; t-0; t-1..... t-15 ; t-16 ;  


```{}
public class mlCreateFeatures {

	private static String filePath="Input/";
	
	public static void main(String[] args) {
		
		String fileName = "MUTAG";//IMDB-BINARY , COLLAB , ENZYMES
		outPutFeatures(fileName);
		outPutFeaturesGraphVector(fileName);
		// Files are availble in Project folder /Output
	}
```

## Description
* GraphID: It is to identify and match to the input file. During machine learning task, this column will be ignored
* Class: Our task to predict the value of class Label using training data.
* Components: Integer value indicating how many connected component exists in the each Graph 
* AvgDegree : Average outgoing degree of all nodes  
* Density : Density of Graph ( see https://en.wikipedia.org/wiki/Dense_graph)
* t-0...t-38 : corresponds to the array output of Graphlet search. (on ML-<fileName>.csv datasets)
* t-0...t-16 : corresponds to the array output of SimpleGraphVector.toArray . (on ML-GV-<fileName>.csv datasets)


# Using Data
We will do followings during our Machine learning process:  
0. Prepare Environment  
1. Import data   
2. Data cleansing  
3. Train Model   
4. Predict   
5. Analyse performance   

## 0) Prepare Envirnoment
To do the analysis , we need to import related packages.  
```{r env,message=FALSE,echo=TRUE,results='hide',error=FALSE}
rm(list = ls(all = TRUE)) # Removing any exisiting variables

require(dplyr)
require(tidyr)
require(caret)
require(nnet)
require(pROC)
require(e1071)
require(rpart)

## For reproducible results we need to start with the same seed
set.seed(1333) 
```

```{r message=FALSE,echo=TRUE,results='hide',error=FALSE}
if(.Platform$OS.type =="windows")
{
  setwd("D:/gitRepos/PerfAnalysis/")
  
}else {
  setwd("~/gitRepos/PerfAnalysis/")
  ##library(doMC)
  ##registerDoMC(cores = 4)
}
source('myMultiClassSummary.R')
```
## 1) Import data
We first read the csv files:  ML-MUTAG.csv and ML-GV-MUTAG.csv .
Here GV is short for GraphVector and GS is short for GraphletSearch

```{r}
df.input.GS<-read.csv2(file="./input/ML-MUTAG.csv",
                       sep=";",
                       header = TRUE,
                       dec = ".")

df.input.GV<-read.csv2(file="./input/ML-GV-MUTAG.csv",
                       sep=";",
                       header = TRUE,
                       dec = ".")
```
Lets see the dimension of each data frame:

```{r echo=FALSE }
compare<-rbind(GS=dim(df.input.GS),GV=dim(df.input.GV))
colnames(compare)<-c("Rows","Columns")

print(compare)
```

## 2) Data cleansing  
1. First we need to correct column data types in order our classification algorithms to work. So the GraphID and Class are of the type nominal (or in R called factor).  
2. Then we find the column with zero variance and remove it from dataset.  

```{r tidy=FALSE,warning=FALSE}
df.input.GV$GraphID <-as.factor(df.input.GV$GraphID );
df.input.GV$Class<-as.factor(make.names(df.input.GV$Class)); 

df.input.GS$GraphID <-as.factor(df.input.GS$GraphID );
df.input.GS$Class<-as.factor(make.names(df.input.GS$Class)); 

## Finding and removing columns with zero variance
nzv.GS<-nearZeroVar(df.input.GS)
nzv.GV<-nearZeroVar(df.input.GV)

```

Following columns/features have zero variance, which can be removed:  

```{r echo=FALSE,tidy=FALSE,warning=FALSE}
print(compare)
cat("\nColumns for GS to be removed:  ")
cat(paste(names(df.input.GS[,nzv.GS]),collapse="; "))
cat("\nColumns for GV to be removed:  ")
cat(paste(names(df.input.GV[,nzv.GV]),collapse="; "))

df.input.GS<-df.input.GS[,-nzv.GS]
df.input.GV<-df.input.GV[,-nzv.GV]

```
Now let see the shape of our datasets after removing zero variance

```{r echo=FALSE,tidy=FALSE,warning=FALSE }

compare<-rbind(GS=dim(df.input.GS),GV=dim(df.input.GV))
colnames(compare)<-c("Rows","Columns")

print(compare)
cat("\nColumns for GS:  ")
cat(paste(names(df.input.GS),collapse="; "))
cat("\nColumns for GV:  ")
cat(paste(names(df.input.GV),collapse="; "))
```

The dimensions corresponds to the following Graph type:  

![All type 3,4,5 Graphlets.](graphletsRef.PNG)

3. Partitioning the datasets into train (70 % ) and test (30 %) for cross validation

```{r }
##randomly partition data into two datasets ,training and testing
#GraphletSearch
inTrain.GS<-createDataPartition(y=df.input.GS$Class,p = .7,list=FALSE)
train.GS<-df.input.GS[inTrain.GS,]   
test.GS<- df.input.GS[-inTrain.GS,] 

#GraphVector
inTrain.GV<-createDataPartition(y=df.input.GV$Class,p = .7,list=FALSE)
train.GV<-df.input.GV[inTrain.GV,]   
test.GV<- df.input.GV[-inTrain.GV,] 

```

## 3) Train Model  
Now we train following models:  
* SVM   
* SVM with Polynomial kernel  
* SVM with Radial kernel  
* Random forest  

```{r parameters,tidy=FALSE}

##Create a data frame from all combinations of the supplied parameters for Radial and linear Kernels
grid <- expand.grid(sigma = c(.01, .015, 0.2),
                    C = c(0.75, 0.9, 1, 1.1, 1.25))  

##Create a data frame from all combinations of the supplied parameters for Polynomial kernel
grid.poly <- expand.grid(C = c(0.75, 0.9, 1, 1.1, 1.25),
                         scale=c(.0001),
                         degree=1:2) 

#We need to check if it is binary classification 
#or multiclass classifcation, which requires diffrent summary function.
if(length(unique(df.input.GS$Class)) > 2) 
{
  ctrl <-trainControl(method="repeatedcv",
                      repeats=10,
                      classProbs=TRUE,
                      summaryFunction=myMultiClassSummary)
} else
{
  ctrl <- trainControl(method="repeatedcv",
                       repeats=10,
                       classProbs=TRUE,
                       summaryFunction=twoClassSummary)
}

## Parameters for Random Forest algorithm.
RF.control <- trainControl(method="repeatedcv", number=10, repeats=3)

```

Before training we will center and scale the features for better accuracy.

### SVM linear  
```{r svmLinear, message=FALSE,error=FALSE,results='hide',cache=TRUE,tidy=FALSE,warning=FALSE}
svmModel.linear.GS <- train(x=train.GS[,-c(1,2)],
                            y= train.GS[,2],
                            method = "svmLinear",
                            preProc = c("center","scale"),
                            trControl=ctrl)  

svmModel.linear.GV <- train(x=train.GV[,-c(1,2)],
                            y= train.GV[,2],
                            method = "svmLinear",
                            preProc = c("center","scale"),
                            trControl=ctrl)  
```

### SVM with polynomial kernel  
```{r svmPoly,message=FALSE,error=FALSE,results='hide',cache=TRUE,tidy=FALSE,warning=FALSE}
svmModel.Poly.GS <- train(x=train.GS[,-c(1,2)],
                          y= train.GS[,2], 
                          method = "svmPoly",
                          preProc = c("center","scale"),
                          tuneGrid = grid.poly,trControl=ctrl)  

svmModel.Poly.GV <- train(x=train.GV[,-c(1,2)],
                          y= train.GV[,2],
                          method = "svmPoly",
                          preProc = c("center","scale"),
                          tuneGrid = grid.poly,
                          trControl=ctrl)  
```
### SVM with radial kernel  
```{r svmRadial,message=FALSE,error=FALSE,results='hide',cache=TRUE,tidy=FALSE,warning=FALSE}
svmModel.Radial.GS <- train(x=train.GS[,-c(1,2)],
                            y= train.GS[,2],
                            method = "svmRadial",
                            preProc = c("center","scale"),
                            tuneGrid = grid,
                            trControl=ctrl)  

svmModel.Radial.GV <- train(x=train.GV[,-c(1,2)],
                            y= train.GV[,2],
                            method = "svmRadial",
                            preProc = c("center","scale"),
                            tuneGrid = grid,
                            trControl=ctrl)  
```

# Random forest  
```{r RandomForest,message=FALSE,error=FALSE,results='hide',cache=TRUE,tidy=FALSE,warning=FALSE}
#Random forest has its own naming preference
#GraphletSearch dataset
RfTrainGS<-cbind(y=train.GS[,2],train.GS[,-c(1,2)]) 

RfModelGS <- train(y~.,data=RfTrainGS,method = "rf",
                   preProcess=c("center","scale"),
                   trControl=RF.control,
                   prox=TRUE, 
                   tuneGrid=expand.grid(mtry = 5),
                   number=10,
                   ntree=500) 

#GraphVector dataset
RfTrainGV<-cbind(y=train.GV[,2],train.GV[,-c(1,2)]) 
RfModelGV <- train(y~.,data=RfTrainGV,method = "rf",
                   preProcess=c("center","scale"),
                   trControl=RF.control,
                   prox=TRUE,
                   tuneGrid=expand.grid(mtry = 5),
                   number=10,
                   ntree=500) 

```

## 4) Predict  
We will predict the Class value for each Graph on test dataset so we can later compare to the actual Class with the predicted value and finally calculate performance.
```{r tidy=FALSE,message=FALSE,error=FALSE,warning=FALSE}
## Predicting for different models for GraphletSearch test dataset
## -c(1,2) means removing GraphID,Class
prediction.y.linear.GS<-predict(svmModel.linear.GS,test.GS[,-c(1,2)]) 
prediction.y.Radial.GS<-predict(svmModel.Radial.GS,test.GS[,-c(1,2)])
prediction.y.Poly.GS  <-predict(svmModel.Poly.GS,test.GS[,-c(1,2)])
prediction.y.RF.GS<-predict(RfModelGS,test.GS[,-c(1,2)])

## Predicting for different models for GraphVector test dataset
prediction.y.linear.GV<-predict(svmModel.linear.GV,test.GV[,-c(1,2)])
prediction.y.Radial.GV<-predict(svmModel.Radial.GV,test.GV[,-c(1,2)])
prediction.y.Poly.GV  <-predict(svmModel.Poly.GV,test.GV[,-c(1,2)])
prediction.y.RF.GV<-predict(RfModelGV,test.GV[,-c(1,2)])
```

## 5) Analyse performance  
First we calculate performance for each model, then we will put them in a table for each dataset:
```{r}
confMatrix.linear.GS<-confusionMatrix(test.GS[,2],prediction.y.linear.GS)
confMatrix.linear.GV<-confusionMatrix(test.GV[,2],prediction.y.linear.GV)

confMatrix.radial.GS<-confusionMatrix(test.GS[,2],prediction.y.Radial.GS)
confMatrix.radial.GV<-confusionMatrix(test.GV[,2],prediction.y.Radial.GV)

confMatrix.poly.GS<-confusionMatrix(test.GS[,2],prediction.y.Poly.GS)
confMatrix.poly.GV<-confusionMatrix(test.GV[,2],prediction.y.Poly.GV)

confMatrix.rf.GS<-confusionMatrix(test.GS[,2],prediction.y.RF.GS)
confMatrix.rf.GV<-confusionMatrix(test.GV[,2],prediction.y.RF.GV)

acc.GS<-cbind(
        Statistic='Accuracy',
        SVM_Linear_GS=confMatrix.linear.GS$overall['Accuracy'],
        SVM_Radial_GS=confMatrix.radial.GS$overall['Accuracy'],
        SVM_Poly_GS=confMatrix.poly.GS$overall['Accuracy'],
        RandForest_GS=confMatrix.rf.GS$overall['Accuracy'])

kappa.GS<-cbind(
        Statistic='Kappa',
        SVM_Linear_GS=confMatrix.linear.GS$overall['Kappa'],
        SVM_Radial_GS=confMatrix.radial.GS$overall['Kappa'],
        SVM_Poly_GS=confMatrix.poly.GS$overall['Kappa'],
        RandForest_GS=confMatrix.rf.GS$overall['Kappa'])


acc.GV<-cbind(
        Statistic='Accuracy',
        SVM_Linear_GV=confMatrix.linear.GV$overall['Accuracy'],
        SVM_Radial_GV=confMatrix.radial.GV$overall['Accuracy'],
        SVM_Poly_GV=confMatrix.poly.GV$overall['Accuracy'],
        RandForest_GV=confMatrix.rf.GV$overall['Accuracy'])

kappa.GV<-cbind(
        Statistic='Kappa',
        SVM_Linear_GV=confMatrix.linear.GV$overall['Kappa'],
        SVM_Radial_GV=confMatrix.radial.GV$overall['Kappa'],
        SVM_Poly_GV=confMatrix.poly.GV$overall['Kappa'],
        RandForest_GV=confMatrix.rf.GV$overall['Kappa'])

comparison.GS<-as.data.frame(rbind(acc.GS,kappa.GS))
comparison.GV<-as.data.frame(rbind(acc.GV,kappa.GV))
```

Confusion matrix provides different information about performance of our model. As an Example for GraphletSearch dataset using Random forest:

Note: X1 means Class label "1" and X.1 refers to the Class label "-1" of the graph.

```{r echo=FALSE}
confMatrix.rf.GS
```

Generally we can compare models on each dataset based on their accuracy and kappa value. For the same range of Accuracy, we would say the method with higher kappa value provides more reliability on the result.

Performance of GraphletSearch and GraphVector datasets shown below:

```{r error=FALSE,message=FALSE,results='hide',include=FALSE,tidy=FALSE}
## Reshaping data from wide to long format for plotting, and correcting data type for value column
##GraphletSearch
gsLongFormat<-gather(data = comparison.GS,key =  ModelName,value = 'value',SVM_Linear_GS:RandForest_GS,factor_key =TRUE )
gsLongFormat$value<-as.numeric(gsLongFormat$value)

##GraphVector
gvLongFormat<-gather(data = comparison.GV,key =  ModelName,value = 'value',SVM_Linear_GV:RandForest_GV,factor_key =TRUE )
gvLongFormat$value<-as.numeric(gvLongFormat$value)


```

```{r echo=FALSE,message=FALSE,error=FALSE,tidy=FALSE}

## Creating the charts after converting data from wide to long format
gs<-ggplot(data=gsLongFormat,aes(x=ModelName,y=value,fill=Statistic))+
        geom_bar(stat="identity",position=position_nudge())+
        ggtitle("Performance comparison for dataset GraphletSearch[Mutag]")


gv<-ggplot(data=gvLongFormat,aes(x=ModelName,y=value,fill=Statistic))+
        geom_bar(stat="identity",position=position_nudge())+
        ggtitle("Performance comparison for dataset GraphVector[Mutag]")

cat("For GraphletSearch:\n\n")
print(as.matrix(comparison.GS[,-1]))
cat("\n\nFor GraphVector:\n\n")
print(as.matrix(comparison.GV[,-1]))

print(gs)
cat("\n\n\n\n\n")
print(gv)

```