Machine problem 1 Applying KNN to MNIST dataset . Using three test protocols to evaluate the performance . Training/test split: 50000 training examples + 10000 test examples . Training/validation/test split: 40000 training examples + 10000 validation examples + 10000 test examples . 5-fold cross-validation and 10-fold cross validation (average and standard deviation)
. Due in two weeks (Sep 2nd before class) What shall be submitted? . Your implementation . What language you choose to implement the algorithm? . How do you search for the K nearest neighbors in the feature vector space? Exhaustive search? Or smarter one (hint: k-d tree)? . Your algorithm’s complexity to predict on one test example? . Your results . Test errors under different test protocols . How performance changes when K varies? Why? . Submitting your source code . Electric report in PDF or Word formats . Where: cap5610ucf@gmail.com.
In Machine Learning typical tasks are concept learning, function learning or predictive modeling, clustering and finding predictive patterns. The above are learned through available data that were observed through experiences or instructions.
The K-nearest neighbors algorithm is one of the simplest machine learning algorithms and is an example of instance-based learning, where new data are classified based on stored, labeled instances. The distance between the stored data and the new instance is calculated by means of some kind of a similarity measure. This similarity measure is typically expressed by a distance measure such as the Euclidean distance, cosine similarity, or the Manhattan distance. In other words, the similarity to the data that was already in the system is calculated for any new data point that you input into the system. Then, you use this similarity value to perform predictive modeling. Predictive modeling is either classification, assigning a label or a class to the new instance, or regression, assigning a value to the new instance. Basically, tell me who your neighbors are, and I will tell you who you are. Chosing K is an important step in KNN. In theory, if their is an infinite number of samples available, the larger is k, the better is the classification. The one thing we must be aware of is that all k neighbors have to be close. This is possible when infinite number of samples are available, but it is impossible in practice since the number of samples is finite. It is referred to as a lazy learning algorithm because the function is only approximated locally and all computation is deferred until classification.
The performance of KNN classification depends on the way that distances are computed between different examples. Distance usually relates to all the attributes and assumes all of them have the same effects on distance. When no prior knowledge is available, most implementations of KNN compute simple Euclidean distances. Euclidean distance is the straight line distance between two points. Unfortunately, Euclidean distances ignore any statistical regularities that might be estimated from a large training set of labeled examples. When searching for the K nearest neighbors in the feature vector space, I tried different algorithms available through the R package caret’s training function. I noticed that there was great improvement in processing time when using the kd tree algorithm. The k dimensional tree is a data structure used for space partitioning that organizes points in a k-dimensional space. They are a useful for searches involving a multidimensional search key like nearest neighbor.
The complexity for basic KNN algorithm is high because it stores all examples. If we have n examples each of dimension d, it will take O(d) to compute the distance to one examples, and O(nd) to compute the distances to all examples. Furthermore, it will take O(nk) time to find k closest examples. The total time would be O(nk+nd), and this is very expensive for a large number of samples. The issue is that we need a large number of samples for KNN to work well.
I used R because it is very powerful software with vast amounts of graphical and statistical resources. There were three approaches used in which we handled the dataset differently: 1)Using a training set and a test set, 2)Using a training set, validation set, and test set, and 3)Cross Validation. Due to processing time and efficiency related to the large minst dataset that we used, I sampled 1000 observations from each set used. If we used the full set for each approach, we would achieve more specific results to this database. I did not pre process the data.
The main idea is that the primary discovery of relationships is usually done with a training set, and we use the test and validation sets to measure if these relationships are maintained. Using protocol 1, we can run into overfitting, and we try minimize this by using a validation set in protocol 2.
With cross validation, we divide the data sample into a number of v folds, which are randomly drawn, disjointed sub-samples. For a fixed value of K, we apply the KNN model to make predictions on the vth segment. We use the v-1 segments as the examples, and evaluate the error. The most common choice for this error for regression is sum-of-squared and for classification it is defined as the accuracy. This process is then successively applied to all possible choices of v. At the end of the v cycles or folds, the computed errors are averaged and gives us a measure of the stability for the model. We repeat these steps for various K’s, and the K that has the lowest error is selected as the best value for K for cross validation. The main concept is that the observations cross over between the training set and the validation set.
Variance refers to the amount by which f-hat would change if we estimated it using a different training data set. Since the training data is used to fit the statistical learning method, different training data sets will result in a different f-hat. Ideally the estimate for f should not vary too much between training sets. However, if a method has high variance, then small changes in the training data can result in large changes in f-hat. In general, more flexible statistical methods have higher variance. Bias refers to the error that is introduced by approximating a real life problem, which may be extremely complicated, by a much simpler model. For example, linear regression assumes that there is a linear relationship, but it is unlikely that real life problems are truly like this. Generally, as we use more flexible methods, the variance will increase and the bias will decrease. The relative rate of change of these two will determine whether the test Mean Standard Error increases or decreases. Good test set performance of a statistical learning method requires low variance as well as low squared bias. This bias variance trade-off happens because it is easy to obtain a method with extremely low bias but high variance, or a method with very low variance but high bias. We aim to find a method for which both the variance and the squared bias are low.
In protocol 1, as we increase K, one can see that the error drops and then levels off. In protocol 2, we as we increase k, one can see that the error drops off faster than protocol 1, and then later drops off more but variability also increases. The ideal K that was generated in cross validation was K = 9, which is actually the lowest point in protocol 1 and near the lowest point in protocol 2 prior to the variability.
RESULTS: Training set 50,000, Test set 10,000, K = 1 through 15.
#Loading necessary Libraries
library(kknn)
library(caret)
library(doParallel)
# Setting up parallel processing.
cl <-makeCluster(detectCores())
registerDoParallel(cl)
#Functions to Load files and labels
loadingmnist <-function()
{
loadingimage <-function(filename)
{
ret =list()
f =file(filename,'rb')
readBin(f,'integer',n=1,size=4,endian='big')
ret$n =readBin(f,'integer',n=1,size=4,endian='big')
nrow =readBin(f,'integer',n=1,size=4,endian='big')
ncol =readBin(f,'integer',n=1,size=4,endian='big')
x =readBin(f,'integer',n=ret$n*nrow*ncol,size=1,
signed=F)
ret$x =matrix(x, ncol=nrow*ncol, byrow=T)
close(f)
ret
}
loadinglabel <-function(filename)
{
f = file(filename,'rb')
readBin(f,'integer',n=1,size=4,endian='big')
n = readBin(f,'integer',n=1,size=4,endian='big')
y = readBin(f,'integer',n=n,size=1,signed=F)
close(f)
y
}
#Loading files and labels
train <<-loadingimage('train-images.idx3-ubyte')
train$y <<-loadinglabel('train-labels.idx1-ubyte')
}
#Funtion to display digit
displaydigit <- function(arr784, col=gray(12:1/12), ...)
{
image(matrix(arr784, nrow=28)[,28:1], col=col, ...)
}
#Establishing train and test as data frames
train <-data.frame()
#Calling the functions to load the database
loadingmnist()
#Normalizing: X=(X - min)/(max - min)=>X=(X-0)/(255-0)=> X=X/255.
train$x <-train$x/255
#Dividing data with training having 50k and testing having 10k.
inTrain = data.frame(y=train$y, train$x)
inTrain$y <- as.factor(train$y)
trainIndex = createDataPartition(inTrain$y, p=0.833245, list=FALSE)
training = inTrain[trainIndex,]
testing = inTrain[-trainIndex,]
rm(train)
#Sampled Observations from each data set
#Training & Cross Validation (Out of full data set)
n=1000
#Validation
nv=1000
#Testing
nt=1000
#Setting seed to produce same results
set.seed(123)
training =training[sample(nrow(training),n),]
set.seed(123)
testing =testing[sample(nrow(training),nt),]
#Creating model using K Nearest neighbor
for (i in seq(3, 19, 2))
{
cat("Processing KNN using K = ", i, "\n", sep = "")
fit <- train(y ~ .,data =training, method ='kknn',metric="Accuracy",algorithm =c("kd_tree"),tuneLength=3,number=1,k=i)
results <-predict(fit,newdata =testing)
output <-confusionMatrix(results,testing$y)
print(output$overall)
print(output$table)
}
## Processing KNN using K = 3
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8320000 0.8129374 0.8073589 0.8546678 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 97 0 2 0 0 1 2 0 1 1
## 1 1 118 9 4 4 0 3 1 5 1
## 2 0 0 70 3 3 1 0 1 0 2
## 3 0 2 9 84 0 7 0 3 6 1
## 4 0 1 4 0 70 1 1 5 1 2
## 5 1 0 0 1 1 60 2 0 5 0
## 6 2 0 3 0 1 2 87 0 2 0
## 7 0 0 2 0 1 0 0 91 0 13
## 8 0 0 1 5 0 3 0 0 73 0
## 9 0 0 0 2 15 3 0 12 3 82
## Processing KNN using K = 5
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8320000 0.8129374 0.8073589 0.8546678 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 97 0 2 0 0 1 2 0 1 1
## 1 1 118 9 4 4 0 3 1 5 1
## 2 0 0 70 3 3 1 0 1 0 2
## 3 0 2 9 84 0 7 0 3 6 1
## 4 0 1 4 0 70 1 1 5 1 2
## 5 1 0 0 1 1 60 2 0 5 0
## 6 2 0 3 0 1 2 87 0 2 0
## 7 0 0 2 0 1 0 0 91 0 13
## 8 0 0 1 5 0 3 0 0 73 0
## 9 0 0 0 2 15 3 0 12 3 82
## Processing KNN using K = 7
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8420000 0.8240331 0.8178924 0.8640735 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 98 0 2 0 0 1 2 0 1 1
## 1 1 119 11 4 4 0 4 2 6 1
## 2 0 0 71 3 3 1 0 0 0 2
## 3 0 1 7 84 0 9 0 3 7 1
## 4 0 1 3 0 72 0 1 5 0 0
## 5 0 0 1 1 1 60 1 0 5 0
## 6 2 0 3 0 1 1 87 0 1 0
## 7 0 0 1 0 1 0 0 94 0 14
## 8 0 0 1 5 0 3 0 0 74 0
## 9 0 0 0 2 13 3 0 9 2 83
## Processing KNN using K = 9
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8450000 0.8273738 0.8210585 0.8668891 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 98 0 2 0 0 1 2 0 1 1
## 1 1 119 11 4 4 0 4 2 6 2
## 2 0 0 73 3 3 1 0 0 1 1
## 3 0 1 7 85 0 8 0 3 7 1
## 4 0 1 3 0 72 0 1 5 0 1
## 5 0 0 1 1 1 61 1 0 5 0
## 6 2 0 3 0 1 1 87 0 1 0
## 7 0 0 0 0 1 0 0 95 0 14
## 8 0 0 0 4 0 3 0 0 73 0
## 9 0 0 0 2 13 3 0 8 2 82
## Processing KNN using K = 11
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8580000 0.8418606 0.8348130 0.8790551 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 98 0 2 0 0 1 1 0 1 1
## 1 1 119 11 4 4 0 4 2 6 2
## 2 0 0 73 2 3 1 0 0 1 1
## 3 0 1 6 87 0 7 0 3 8 1
## 4 1 1 3 0 76 0 1 5 0 1
## 5 0 0 1 1 1 62 1 0 3 0
## 6 1 0 3 0 0 1 88 0 0 0
## 7 0 0 0 0 0 0 0 95 0 12
## 8 0 0 1 3 0 3 0 0 76 0
## 9 0 0 0 2 11 3 0 8 1 84
## Processing KNN using K = 13
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8590000 0.8429699 0.8358734 0.8799885 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 98 0 2 0 0 1 1 0 1 1
## 1 1 119 12 4 4 0 5 2 6 2
## 2 0 0 73 2 3 1 0 0 1 1
## 3 0 1 5 87 0 7 0 4 8 1
## 4 1 1 3 0 78 0 0 5 0 1
## 5 0 0 1 1 1 62 1 0 3 0
## 6 1 0 3 0 0 1 88 0 0 0
## 7 0 0 0 0 0 0 0 95 0 12
## 8 0 0 1 3 0 3 0 0 76 1
## 9 0 0 0 2 9 3 0 7 1 83
## Processing KNN using K = 15
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8610000 0.8451792 0.8379955 0.8818541 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 98 0 3 0 0 1 0 0 1 1
## 1 1 119 13 4 4 0 5 4 7 2
## 2 0 0 72 2 3 1 0 0 1 0
## 3 0 1 4 88 0 7 0 1 8 1
## 4 1 1 3 0 78 0 0 5 0 1
## 5 0 0 1 1 1 62 1 0 3 0
## 6 1 0 3 0 0 1 89 0 0 0
## 7 0 0 0 0 0 0 0 96 0 12
## 8 0 0 1 2 0 3 0 0 75 1
## 9 0 0 0 2 9 3 0 7 1 84
## Processing KNN using K = 17
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8640000 0.8485134 0.8411814 0.8846498 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 98 0 3 0 0 0 0 0 1 1
## 1 1 119 13 4 4 0 6 4 7 2
## 2 0 0 72 1 2 1 0 0 1 0
## 3 0 1 4 89 0 8 0 1 8 1
## 4 1 1 3 0 79 0 0 5 0 0
## 5 0 0 1 1 1 62 1 0 2 0
## 6 1 0 3 0 0 1 88 0 0 0
## 7 0 0 0 0 0 0 0 96 0 12
## 8 0 0 1 2 0 3 0 0 76 1
## 9 0 0 0 2 9 3 0 7 1 85
## Processing KNN using K = 19
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8620000 0.8462860 0.8390571 0.8827864 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 98 0 2 0 0 0 0 0 1 1
## 1 1 119 14 5 4 0 6 4 7 2
## 2 0 0 71 1 2 0 0 0 1 0
## 3 0 1 4 88 0 8 0 1 8 1
## 4 1 1 3 0 79 0 0 5 0 0
## 5 0 0 1 1 1 63 1 0 3 0
## 6 1 0 3 0 0 1 88 0 0 0
## 7 0 0 0 0 0 0 0 96 0 12
## 8 0 0 2 2 0 3 0 0 75 1
## 9 0 0 0 2 9 3 0 7 1 85
The preferred for protocol 1 is K = 9.
2nd Test Protocol Training set 40,000, Validation set 10,000, Test set 10,000, K = 3 through 19.
#Dividing data with training having 40k, validation set having 10k, and testing having 10k.
trainIndex = createDataPartition(inTrain$y, p=0.666565, list=FALSE)
training = inTrain[trainIndex,]
inbet = inTrain[-trainIndex,]
valIndex = createDataPartition(inbet$y, p=0.499745, list=FALSE)
validation = inbet[valIndex,]
#Setting seed to produce same results
set.seed(123)
training =training[sample(nrow(training),n),]
set.seed(123)
validation =validation[sample(nrow(training),nv),]
#Creating model using K Nearest neighbor
for (i in seq(3, 19, 2))
{
cat("Processing KNN using K = ", i, "\n", sep = "")
fit <- train(y ~ .,data = training, method ='kknn', metric="Accuracy",algorithm =c("kd_tree"),tuneLength=3, number=1,k=i)
results <- predict(fit, newdata =validation)
output<-confusionMatrix(results,validation$y)
print(output$overall)
print(output$table)
}
## Processing KNN using K = 3
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8230000 0.8029056 0.7979041 0.8461772 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 94 0 1 0 0 2 5 1 1 1
## 1 1 127 6 3 2 2 0 3 9 0
## 2 1 0 78 3 2 0 1 1 0 2
## 3 0 0 6 82 1 5 0 2 9 0
## 4 0 0 2 1 80 1 3 2 2 9
## 5 3 0 1 4 0 64 4 0 9 0
## 6 4 0 2 1 0 2 87 0 3 0
## 7 0 0 5 4 3 0 0 79 0 13
## 8 1 1 2 2 1 4 0 0 59 0
## 9 0 0 1 0 7 1 0 8 1 73
## Processing KNN using K = 5
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8230000 0.8029056 0.7979041 0.8461772 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 94 0 1 0 0 2 5 1 1 1
## 1 1 127 6 3 2 2 0 3 9 0
## 2 1 0 78 3 2 0 1 1 0 2
## 3 0 0 6 82 1 5 0 2 9 0
## 4 0 0 2 1 80 1 3 2 2 9
## 5 3 0 1 4 0 64 4 0 9 0
## 6 4 0 2 1 0 2 87 0 3 0
## 7 0 0 5 4 3 0 0 79 0 13
## 8 1 1 2 2 1 4 0 0 59 0
## 9 0 0 1 0 7 1 0 8 1 73
## Processing KNN using K = 7
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8360000 0.8173575 0.8115687 0.8584337 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 94 0 2 0 0 2 5 1 1 2
## 1 1 128 7 3 3 2 1 4 9 0
## 2 0 0 75 3 2 0 1 1 0 2
## 3 0 0 6 82 0 5 0 1 6 0
## 4 0 0 3 1 82 1 1 3 2 6
## 5 4 0 1 3 0 66 3 0 9 0
## 6 4 0 2 0 0 2 89 0 3 0
## 7 0 0 5 4 1 0 0 80 0 10
## 8 1 0 2 3 1 2 0 0 62 0
## 9 0 0 1 1 7 1 0 6 1 78
## Processing KNN using K = 9
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8310000 0.8117860 0.8063073 0.8537255 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 94 0 2 0 0 2 5 1 1 2
## 1 1 128 7 3 3 2 1 4 9 0
## 2 0 0 75 3 2 0 1 1 0 1
## 3 0 0 6 82 0 7 0 1 7 0
## 4 0 0 3 1 82 1 1 3 2 8
## 5 4 0 1 3 2 64 3 0 9 0
## 6 4 0 2 0 0 2 89 0 3 0
## 7 0 0 5 4 1 0 0 79 0 9
## 8 1 0 2 3 0 2 0 0 60 0
## 9 0 0 1 1 6 1 0 7 2 78
## Processing KNN using K = 11
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8350000 0.8162035 0.8105158 0.8574927 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 96 0 2 0 0 2 4 1 1 2
## 1 1 128 10 3 3 2 2 4 10 0
## 2 0 0 73 3 2 0 1 1 0 1
## 3 0 0 6 83 0 7 0 0 9 0
## 4 0 0 3 1 83 1 1 3 1 8
## 5 4 0 1 3 2 64 3 0 7 0
## 6 2 0 2 0 0 1 89 0 2 0
## 7 0 0 5 4 0 0 0 81 0 9
## 8 1 0 1 2 0 2 0 0 60 0
## 9 0 0 1 1 6 2 0 6 3 78
## Processing KNN using K = 13
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8400000 0.8217640 0.8157832 0.8621948 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 97 0 2 0 0 2 5 1 1 2
## 1 1 128 11 3 3 2 2 4 10 0
## 2 0 0 74 2 2 0 1 0 0 1
## 3 0 0 5 84 0 7 0 0 9 0
## 4 0 0 4 1 83 1 1 3 1 9
## 5 4 0 1 3 2 65 2 0 7 0
## 6 2 0 2 0 0 1 89 0 2 0
## 7 0 0 3 4 0 0 0 82 0 8
## 8 0 0 1 2 0 1 0 0 60 0
## 9 0 0 1 1 6 2 0 6 3 78
## Processing KNN using K = 15
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8410000 0.8228576 0.8168377 0.8631343 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 97 0 2 0 0 2 5 1 1 2
## 1 1 128 13 5 3 2 2 4 10 0
## 2 0 0 75 2 0 0 1 0 0 1
## 3 0 0 4 82 0 7 0 0 8 0
## 4 0 0 3 1 85 1 1 3 1 8
## 5 4 0 1 3 2 65 2 0 7 0
## 6 2 0 1 0 0 1 89 0 2 0
## 7 0 0 3 4 0 0 0 80 0 8
## 8 0 0 1 2 0 1 0 0 61 0
## 9 0 0 1 1 6 2 0 8 3 79
## Processing KNN using K = 17
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8400000 0.8217207 0.8157832 0.8621948 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 97 0 3 0 0 2 5 1 1 1
## 1 1 128 14 5 3 2 2 5 11 0
## 2 0 0 74 1 0 0 1 0 0 1
## 3 0 0 3 83 0 7 0 0 8 0
## 4 0 0 4 1 85 1 1 3 1 9
## 5 4 0 0 3 2 65 2 0 6 0
## 6 2 0 1 0 0 1 89 0 2 0
## 7 0 0 3 4 0 0 0 79 0 8
## 8 0 0 1 1 0 1 0 0 61 0
## 9 0 0 1 2 6 2 0 8 3 79
## Processing KNN using K = 19
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8380000 0.8194751 0.8136753 0.8603149 0.1280000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 98 0 3 0 0 2 5 1 1 1
## 1 1 128 14 5 4 2 2 6 11 0
## 2 0 0 74 1 0 0 1 0 0 1
## 3 0 0 3 82 0 7 0 0 9 0
## 4 0 0 4 1 84 1 1 3 1 9
## 5 3 0 0 3 2 65 2 0 6 0
## 6 2 0 1 0 0 1 89 0 2 0
## 7 0 0 3 4 0 0 0 79 0 8
## 8 0 0 1 2 0 1 0 0 60 0
## 9 0 0 1 2 6 2 0 7 3 79
Applying the model picked from validation set, to test set using K = 7.
#Creating model using K Nearest neighbor
fit <- train(y ~ .,data = training, method = 'kknn', metric="Accuracy",algorithm = c("kd_tree"),tuneLength=3,number=1,k=7)
results <- predict(fit, newdata = head(testing,n))
output<-confusionMatrix(results, head(testing$y,n))
print(output$overall)
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 0.8550000 0.8385912 0.8316337 0.8762528 0.1210000
## AccuracyPValue McnemarPValue
## 0.0000000 NaN
print(output$table)
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 96 0 0 1 0 0 2 0 2 0
## 1 0 119 8 4 2 1 0 3 6 1
## 2 0 0 75 2 2 1 0 0 2 0
## 3 0 0 6 87 0 4 0 1 7 1
## 4 0 1 2 0 84 1 1 0 0 8
## 5 5 1 0 2 0 64 1 1 7 0
## 6 0 0 3 0 0 2 89 0 2 0
## 7 0 0 3 1 0 0 0 97 0 15
## 8 0 0 3 2 1 3 1 0 68 1
## 9 0 0 0 0 6 2 1 11 2 76
3rd Protocol 5-fold cross-validation and 10-fold cross validation (average and standard deviation)
trainIndex = createDataPartition(inTrain$y, p=0.833245, list=FALSE)
training = inTrain[trainIndex,]
#Setting seed to produce same results
set.seed(123)
training =training[sample(nrow(training),n),]
for (i in seq(5, 10, 5))
{
#Creating model using K Nearest neighbor
ctrl <- trainControl(method = "cv", number = i)
fit <- train(y ~ .,data = training,method = "kknn",trControl = ctrl)
cat("Processing KNN using Cross Validation " ,i, " Folds\n",sep = "")
results<-fit$resample
print(results)
cat("Mean Error ",i," Folds is\n",sep = "")
print(mean(1-(results$Accuracy)))
cat("Error Standard Deviation ",i," Folds is\n",
sep = "")
print(sd(1-(results$Accuracy)))
print(fit$results)
}
## Processing KNN using Cross Validation 5 Folds
## Accuracy Kappa Resample
## 1 0.8500000 0.8330783 Fold1
## 2 0.8059701 0.7840555 Fold4
## 3 0.7900000 0.7665499 Fold3
## 4 0.8500000 0.8330504 Fold2
## 5 0.8592965 0.8433291 Fold5
## Mean Error 5 Folds is
## [1] 0.1689467
## Error Standard Deviation 5 Folds is
## [1] 0.03094418
## kmax distance kernel Accuracy Kappa AccuracySD KappaSD
## 1 5 2 optimal 0.8170982 0.7964721 0.04386075 0.04869544
## 2 7 2 optimal 0.8290633 0.8097916 0.03394313 0.03766544
## 3 9 2 optimal 0.8310533 0.8120127 0.03094418 0.03433486
## Processing KNN using Cross Validation 10 Folds
## Accuracy Kappa Resample
## 1 0.7900000 0.7664071 Fold01
## 2 0.8300000 0.8105427 Fold02
## 3 0.8400000 0.8222025 Fold05
## 4 0.8571429 0.8410013 Fold04
## 5 0.7941176 0.7709091 Fold03
## 6 0.8585859 0.8426431 Fold06
## 7 0.8400000 0.8219452 Fold09
## 8 0.8300000 0.8106272 Fold08
## 9 0.8235294 0.8034261 Fold07
## 10 0.8181818 0.7975000 Fold10
## Mean Error 10 Folds is
## [1] 0.1718442
## Error Standard Deviation 10 Folds is
## [1] 0.02306743
## kmax distance kernel Accuracy Kappa AccuracySD KappaSD
## 1 5 2 optimal 0.8221332 0.8020146 0.03028375 0.03373631
## 2 7 2 optimal 0.8272350 0.8077129 0.02928629 0.03262204
## 3 9 2 optimal 0.8281558 0.8087204 0.02306743 0.02566982
The 5 Fold cross validation approach has a higher average error but lower error standard error, while the 10 Fold validation approach has a lower average error but a higher (Almost twice) error standard deviation. The generated best K is K = 9, containing the ideal lowest error and less variability. If we are interested in least amount of variability within our model and data, then we can choose the 5 Fold approach in this case.
Below is a graph of K’s and their Error rates according to protocol 1 and protocol 2.
x<-c(3,5,7,9,11,13,15,17,19)
y1<-c(0.8400000,0.8400000,0.8540000,0.8560000,0.8520000,0.8530000,0.8500000,0.8490000,0.8470000)
y2<-c(0.8400000,0.8400000,0.8450000,0.8420000,0.8360000,0.8350000,0.8330000,0.8320000,0.8300000)
y3<- 1-y1
y4<- 1-y2
plot(x,y3,type="b",col="red",main = "Error Rates as K increases",xlab = "K value",ylab = "Error Rate",ylim = c(0,.20), xlim = c(0,23))
lines(x,y4,col="green")
Protocol 1 is RED and Protocol 3 is GREEN.
We will verifying the Final Model below trying to predict an image.
#Setting seed to produce same results
set.seed(123)
#Best model selected
fit <- train(y ~ .,data = training, method = 'kknn', metric="Accuracy",algorithm = c("kd_tree"),tuneLength=3,number=1,k=9)
#Drawing a digit.
displaydigit(as.matrix(training[5,2:785]))
#Predicting the digit.
print("The predicted digit is:")
## [1] "The predicted digit is:"
predict(fit, newdata = training[5,])
## [1] 7
## Levels: 0 1 2 3 4 5 6 7 8 9
#Verifying the answer for the digit.
print("The actual digit is:")
## [1] "The actual digit is:"
training[5,1]
## [1] 7
## Levels: 0 1 2 3 4 5 6 7 8 9