Pre-rotated Multi-scaling Digit Recognition: matlab Scripts
The main program of Pre-rotated Multi-scaling Digit Recognition divides into fours parts by calibrating and trainning two versions of CNN nets
- CNNDigitTrainedNet(DigitCNNLayer); and
- CNNAugmented180Net(DigitCNNLayer + Rotation Augmented)
The CNN integrated 10-neuron fully-connected at the end for label classifiction Two versons of CNN are calibrated with MNIST Database. Details of my architecture nets can refer to my report. Both well-calibrated CNNDigitTrainedNet and CNNAugmented180Net are train over more 30 minutes with single GPU processor. To save your times and resources, you only need to re-load the well-calibrated”CNNDigitTrainedNet.mat” and “CNNAugmented180Net.mat” for label classification Purpose. To search mulit-scale digit images, I use Gaussian filter with 9-square-box automatically search Methods.
Four Input File Images:
Main.m files
Case 1 : Computer-Generated Digits Images
clear all; clc; close all;
imagepaper = imread('computer_generated.png') ;
imagepaper = im2double(imagepaper);
figure ; imshow(imagepaper) ; axis on
[binaryImageWithBackground, binaryImageWithoutBackground] = dk_BackgroundRemover(imagepaper);
SquareBoxScaleUp = 2.5
[searchDigitLocation, numbOfDigitFound] = dk_search_digit(binaryImageWithoutBackground, SquareBoxScaleUp) ;
dk_fit_digit_9squareboxCase01(imagepaper, searchDigitLocation) ;
load CNNDigitTrainedNet.mat
load TestImageNormalizedCase01.mat
computergeneratedLabelsInOrder = [3 4 3 8 2 0 7 2 1 0 9 6 5 0 1 0 7]
TestLabels= computergeneratedLabelsInOrder ;
TestImage28X28X1 = TestImageNormalizedCase01(:,:,2,:);
[TestPred, Testscores] = classify(CNNDigitTrainedNet, TestImage28X28X1 ) ;
confusionmat( categorical(TestLabels), TestPred )
accuracy = sum(TestPred == categorical(TestLabels) ) /numel(TestLabels)Case 2 : Computer-Generated-Pre-Rotated Digits Images:
clear all; clc; close all;
imagepaper = imread('computer_generated_rotated.png') ;
imagepaper = im2double(imagepaper);
figure ; imshow(imagepaper) ; axis on
[binaryImageWithBackground, binaryImageWithoutBackground] = dk_BackgroundRemover(imagepaper);
SquareBoxScaleUp = 2.0
[searchDigitLocation, numbOfDigitFound] = dk_search_digit(binaryImageWithoutBackground, SquareBoxScaleUp) ;
rotatedAngle = -15
dk_fit_digit_9squarebox_rotatedCase02(imagepaper,searchDigitLocation, rotatedAngle ) ;
load CNNAugmented180Net.mat
load TestImageNormalizedRotatedCase02.mat
computergeneratedrotatedLabelsInOrder = [3 4 3 8 2 1 0 7 2 0 9 6 5 7 1 0 0]
TestLabels = computergeneratedrotatedLabelsInOrder ;
TestImage28X28X1 = TestImageNormalizedRotatedCase02(:,:,2,:);
[TestPred, Testscores] = classify(CNNAugmented180Net, TestImage28X28X1 ) ;
confusionmat( categorical(TestLabels), TestPred )
accuracy = sum(TestPred == categorical(TestLabels) ) /numel(TestLabels)Case 3 : Handwritten Digits Images
clear all; clc; close all;
imagepaper = imread('handwritten.png');
imagepaper = im2double(imagepaper);
figure ; imshow(imagepaper) ; axis on
[binaryImageWithBackground, binaryImageWithoutBackground] = dk_BackgroundRemover(imagepaper);
dk_search_digit(binaryImageWithoutBackground, 2.5) ;
load finalpositionCase03.mat
searchDigitLocation = finalpositionCase03 ;
dk_fit_digit_9squareboxHandwrittenCase03(imagepaper,searchDigitLocation ) ;
load CNNDigitTrainedNet.mat
load TestImageNormalizedHandwrittenCase03
HandwrittenLabelsInOrder = [7 7 0 0 3 8 2 4 0 2 1 8 4 9 6 8 6 1 3 1 8 4 1 1 5 8 2 3 4 2 ]
TestLabels = HandwrittenLabelsInOrder ;
TestImage28X28X1 = TestImageNormalizedHandwrittenCase03(:,:,1,:);
[TestPred, Testscores] = classify(CNNDigitTrainedNet, TestImage28X28X1 ) ;
confusionmat( categorical(TestLabels), TestPred )
accuracy = sum(TestPred == categorical(TestLabels) ) /numel(TestLabels)Case 4: Handwritten-Pre-Rotated Digits Images
clear all; clc; close all;
imagepaper = imread('handwritten_rotated.png');
imagepaper = im2double(imagepaper);
figure ; imshow(imagepaper) ; axis on
[binaryImageWithBackground, binaryImageWithoutBackground] = dk_BackgroundRemover(imagepaper);
dk_search_digit(binaryImageWithoutBackground, 1.59) ;
load finalpositionCase04.mat
searchDigitLocation = finalpositionCase04 ;
rotatedAngle = +15
dk_fit_digit_9squarebox_handwrittenrotatedCase04(imagepaper, searchDigitLocation, rotatedAngle)
load CNNDigitTrainedNet.mat
load TestImageNormalizedHandwrittenRotatedCase04
HandwrittenRotatedLabelsInOrder = [0 3 0 8 7 6 4 4 1 2 1 2 2 8 5 7 7 1 6 6 4 ]
TestLabels = categorical(HandwrittenRotatedLabelsInOrder);
TestImage28X28X1 = TestImageNormalizedHandwrittenRotatedCase04(:,:,1,:);
[TestPred, Testscores] = classify(CNNDigitTrainedNet, TestImage28X28X1 ) ;
confusionmat( categorical(TestLabels), TestPred )
accuracy = sum(TestPred == categorical(TestLabels) ) /numel(TestLabels) Deep Net Calibration
Load train digit data
clear all;
memory
filenameImagesTrain = 'MNISTDB/train-images.idx3-ubyte';
filenameLabelsTrain = 'MNISTDB/train-labels.idx1-ubyte';
filenameImagesTest = 'MNISTDB/t10k-images.idx3-ubyte';
filenameLabelsTest = 'MNISTDB/t10k-labels.idx1-ubyte';
TrainImages = reshape((loadMNISTImages( filenameImagesTrain )),[28,28,1,60000]);
TrainLabels = ((loadMNISTLabels( filenameLabelsTrain )));
TestImages = reshape((loadMNISTImages( filenameImagesTest )),[28,28,1,10000]);
TestLabels = ((loadMNISTLabels(filenameLabelsTest )));CNN Layer Configuaration
InputLayer = imageInputLayer([28 28 1],'DataAugmentation','none','Normalization','none','Name','InputLayer');
CNNLayer01 = convolution2dLayer(4,32,'Stride',1,'Padding',0,'BiasLearnRateFactor',2,'NumChannels',1,...
'WeightLearnRateFactor',2, 'WeightL2Factor',1,'BiasL2Factor',1,'Name','CNNLayer01');
CNNLayer01.Weights = randn([4 4 1 32])*0.1;
CNNLayer01.Bias = randn([1 1 32])*0.1;
ReluLayer01 = reluLayer('Name','ReluLayer01');
NormalizationLayer01 = crossChannelNormalizationLayer(3,'Name','NormalizationLayer01','Alpha',0.0001,'Beta',0.75,'K',2);
MaxPooling01 = maxPooling2dLayer(3,'Stride',3,'Name','MaxPooling01','Padding',1);
DropLayer01 = dropoutLayer(0.35, 'Name','DropLayer01');
CNNLayer02 = convolution2dLayer(3,16,'Stride',1, 'Padding',0,'BiasLearnRateFactor',1,'NumChannels',32,...
'WeightLearnRateFactor',1, 'WeightL2Factor',1,'BiasL2Factor',1,'Name','CNNLayer02');
CNNLayer02.Weights = randn([3 3 32 16])*0.0001;
CNNLayer02.Bias = randn([1 1 16])*0.00001;
ReluLayer02 = reluLayer('Name','ReluLayer02');
NormalizationLayer02 = crossChannelNormalizationLayer(3,'Name','NormalizationLayer02','Alpha',0.0001,'Beta',0.75,'K',2);
DropLayer02 = dropoutLayer(0.25, 'Name','DropLayer02');
FullyConnectedOutput = fullyConnectedLayer(10,'WeightLearnRateFactor',1,'BiasLearnRateFactor',1,...
'WeightL2Factor',1,'BiasL2Factor',1,'Name','FullyConnectedOutput');
FullyConnectedOutput.Weights = randn([10 784])*0.0001;
FullyConnectedOutput.Bias = randn([10 1])*0.0001+1;
SoftMaxL= softmaxLayer('Name','SoftMaxL');
classifyLabelOutput = classificationLayer('Name','classifyLabelOutput');
% options = trainingOptions('sgdm','LearnRateSchedule','piecewise','LearnRateDropFactor',0.75,...
% 'LearnRateDropPeriod',1,'L2Regularization',0.0001,'MaxEpochs',16,'Momentum',0.9,'Shuffle','once',...
% 'MiniBatchSize',15,'Verbose',1,'InitialLearnRate',0.043);
options = trainingOptions('sgdm','LearnRateSchedule','piecewise','LearnRateDropFactor',0.75,...
'LearnRateDropPeriod',1,'L2Regularization',0.0001,'MaxEpochs',16,'Momentum',0.9,'Shuffle','once',...
'MiniBatchSize',15,'Verbose',1,'InitialLearnRate',0.043,'Plots','training-progress');
DigitCNNLayer =[InputLayer, CNNLayer01, ReluLayer01,NormalizationLayer01, MaxPooling01, DropLayer01,...
CNNLayer02, ReluLayer02, NormalizationLayer02,DropLayer02, FullyConnectedOutput, SoftMaxL, classifyLabelOutput];Calibrate CNNDigitTrainedNet
CNNDigitTrainedNet = trainNetwork(TrainImages,categorical(TrainLabels),DigitCNNLayer,options) ;
[TestPred, Testscores] = classify(CNNDigitTrainedNet, TestImages ) ;
accuracy = sum(TestPred == categorical(TestLabels) ) /numel(TestLabels) ;
confusionmat(TestPred , categorical(TestLabels))Calibrate CNNAugmented180Net
imageSize = [28 28 1];
imageAugmenter = imageDataAugmenter('RandRotation',[-180,180])
augimds = augmentedImageDatastore(imageSize,TrainImages,categorical(TrainLabels),'DataAugmentation',imageAugmenter);
options = trainingOptions('sgdm','LearnRateSchedule','piecewise','LearnRateDropFactor',0.75,...
'LearnRateDropPeriod',1,'L2Regularization',0.0001,'MaxEpochs',16,'Momentum',0.9,'Shuffle','once',...
'MiniBatchSize',15,'Verbose',1,'InitialLearnRate',0.043,'Plots','training-progress');
CNNAugmented180Net = trainNetwork(augimds,DigitCNNLayer,options) ;
[TestPred, Testscores] = classify(CNNDigitTrainedNet, TestImages ) ;
accuracy = sum(TestPred == categorical(TestLabels) ) /numel(TestLabels) ;
confusionmat(TestPred , categorical(TestLabels)) Functions:
dk_BackgroundRemove
function [binaryImageWithBackground, binaryImageWithoutBackground] = dk_BackgroundRemover(imagepaper)
invertedImage = 1 - rgb2gray(imagepaper);
normalizedImage = (invertedImage - min(min(invertedImage))) / (max(max(invertedImage)) - min(min(invertedImage)));
squarebox = 9 ;
backgroundNoiseImage = ordfilt2(normalizedImage, 1, ones(squarebox), 'symmetric');
backgroundNoiseImage = ordfilt2(backgroundNoiseImage, 5, ones(squarebox), 'symmetric');
substractNormalizedImagefrombackground = imsubtract(normalizedImage, backgroundNoiseImage);
binaryImageWithoutBackground = imbinarize(substractNormalizedImagefrombackground, 0.5);
%binaryImageWithoutBackground = imbinarize(substractNormalizedImagefrombackground, 0.7);
binaryImageWithBackground = 1 - (substractNormalizedImagefrombackground .* binaryImageWithoutBackground);
enddk_classify_all_digits
function dk_classify_all_digits( digits_validation, digits_training , location, scaling)
TestSetNumber = length(digits_validation);
predictedCorrect = 0;
for kk = 1 : TestSetNumber
label = dk_classify_digit( digits_validation(kk).image, digits_training, location, scaling);
if label == digits_validation(kk).label
predictedCorrect = predictedCorrect + 1;
end
end
accuracy = predictedCorrect / TestSetNumber;
fprintf('Test Image Number is %d \n', TestSetNumber)
fprintf('Digit Image Correct-Predicted is %.2f%% \n\n', accuracy *100)
enddk_classifyAllDigitsFoundOnPapers
function dk_classifyAllDigitsFoundOnPapers( binaryImageWithBackground, TestDigitsFound, numbOfDigitFound, actualLabelsInOrder, SquareBoxScaleUp)
scaling = 39; location = [20, 20]; load digits.mat
digits_training = dk_prepare_digits(digits_training,location,scaling);
predictionVector = zeros(1, numbOfDigitFound);
for i = 1 : numbOfDigitFound
predictionVector(i) = dk_classify_digit(binaryImageWithBackground, digits_training, TestDigitsFound(i).location, TestDigitsFound(i).scaling);
end
disp('Predicted Actual')
disp([predictionVector(:) actualLabelsInOrder(:)])
errorVector = find(predictionVector - actualLabelsInOrder ~= 0);
fprintf('SquareBoxScaleUp = %.2f \n', SquareBoxScaleUp )
fprintf('Digits Wrongly Predicted = %d \n', length(errorVector) )
accuracy = (1- (length(errorVector)/numbOfDigitFound))*100;
fprintf('Prediction Accuracy = %.2f%%.\n', accuracy )
enddk_fit_digit_9squarebox
function dk_fit_digit_9squarebox(imagepaper,searchDigitLocation )
figure(1); imagesc(imagepaper);
axis image;colormap gray ; hold on ;
for jj = 1 : size(searchDigitLocation, 1)
[centres, radius] = dk_place_regions(searchDigitLocation(jj, 1:2), searchDigitLocation(jj, 3));
plot_squares(imagepaper, centres, radius)
xlocationMin = min(centres(1,:))
xlocationMax = max(centres(1,:))
ylocationMin = min(centres(2,:))
ylocationMax = max(centres(2,:))
I2ImRd = imcrop(imagepaper ,[ xlocationMin ylocationMin (xlocationMax-xlocationMin) (ylocationMax-ylocationMin )]);
resizedImage = imresize(I2ImRd, [28, 28])
b2wrResizedImage = (1 - resizedImage)
name = sprintf('dkDataFolder/ComputerGeneratedFolder/fig0%s.png',num2str(jj) )
imwrite(b2wrResizedImage, name )
end
hold off
enddk_fit_digit_9squarebox_rotated
function dk_fit_digit_9squarebox_rotated(imagepaper,searchDigitLocation, rotatedAngle )
figure(1); imagesc(imagepaper);
axis image;colormap gray ; hold on ;
for jj = 1 : size(searchDigitLocation, 1)
[centres, radius] = dk_place_regions(searchDigitLocation(jj, 1:2), searchDigitLocation(jj, 3));
plot_squares(imagepaper, centres, radius) ;
xlocationMin = min(centres(1,:)) ;
xlocationMax = max(centres(1,:)) ;
ylocationMin = min(centres(2,:)) ;
ylocationMax = max(centres(2,:)) ;
I2ImRd = imcrop(imagepaper ,[ xlocationMin ylocationMin (xlocationMax-xlocationMin) (ylocationMax-ylocationMin )]);
resizedImage = imresize(I2ImRd, [28, 28]) ;
b2wrResizedImage = (1 - resizedImage) ;
b2wrResizedImageRotated = imrotate( b2wrResizedImage,rotatedAngle) ;
name = sprintf('dkDataFolder/ComputerGeneratedPreRotatedFolder/fig45Rotated0%s.png',num2str(jj) );
imwrite(b2wrResizedImageRotated, name ) ;
end
hold off
enddk_gaussian_filter
function filterOut = dk_gaussian_filter(image, Sigma )
hsize = 2*Sigma ;
% hsize = 4*Sigma ;
Gau2DFilter = fspecial('gaussian', hsize, Sigma);
filterOut = imfilter(image, Gau2DFilter, 'symmetric');
enddk_gaussian_gradients
function [ grad_x, grad_y ] = dk_gaussian_gradients(image, Sigma )
GSFilterImg = dk_gaussian_filter(image, Sigma);
yh = [-0.5 0 0.5]' ;
xh = [-0.5 0 0.5] ;
grad_y = imfilter(GSFilterImg, yh, 'symmetric');
grad_x = imfilter(GSFilterImg, xh, 'symmetric');
enddk_get_patch
function patch = dk_get_patch( image, x, y, patch_radius )
[yLength, xLength, ~] = size(image);
xPatchLeft = x - patch_radius ;
xPatchRight = x + patch_radius ;
yPatchLeft = y - patch_radius ;
yPatchRight = y + patch_radius ;
if xPatchLeft < 1 || xPatchRight > xLength
error('X_Patch out of bound.')
elseif yPatchLeft < 1 || yPatchRight > yLength
error('Y_Patch out of bound.')
else
rowN = yPatchLeft : yPatchRight;
colN = xPatchLeft : xPatchRight;
patch = image(rowN,colN);
end
enddk_gradient_descriptor
function descriptor = dk_gradient_descriptor( patched_image, location, scaling )
factor = 0.05 ;
SigmaScaled = round(scaling * factor);
imageGaussFilter = dk_gaussian_filter(patched_image, SigmaScaled);
[GaussGradientScaled_X, GaussGradientScaled_Y] = dk_gaussian_gradients(imageGaussFilter, SigmaScaled);
[localCentres, localRadius] = dk_place_regions(location,scaling );
squarebox = 9 ;
binNum = 8;
descriptor = zeros(squarebox*binNum, 1); % 8 bins x 9 squareboxes = 72 descriptors
for kk = 1 : squarebox
PatchedGaussGradientScaled_X = dk_get_patch(GaussGradientScaled_X, localCentres(1, kk), localCentres(2, kk), localRadius);
PatchedGaussGradientScaled_Y = dk_get_patch(GaussGradientScaled_Y, localCentres(1, kk), localCentres(2, kk), localRadius);
descriptor((kk-1)*binNum+1 : kk*binNum) = dk_gradient_histogram(PatchedGaussGradientScaled_X, PatchedGaussGradientScaled_Y);
end
% Descriptor Normalization
descriptor = (descriptor - min(descriptor)) / (max(descriptor) - min(descriptor));
enddk_place_regions
function [localCentres, localRadius] = dk_place_regions(location, scaling)
% compute local centers, local radius for 9 squareboxes
localRadius = floor((scaling - 3) / 6);
relativeCenter = (3 * localRadius) + 2;
relativeCenterXY = [relativeCenter, relativeCenter];
relativeCenterLocation = [(relativeCenter - 2*localRadius), relativeCenter, (relativeCenter + 2*localRadius)];
[grid_X, grid_Y] = meshgrid(relativeCenterLocation);
distanceGridXY = bsxfun(@minus, [grid_X(:)'; grid_Y(:)'], relativeCenterXY');
localCentres = bsxfun(@plus, distanceGridXY, location');
localRadius = localRadius + 1;
enddk_prepare_digits
function digits_training = dk_prepare_digits(digits_training,location, scaling )
for i = 1 : length( digits_training)
digits_training(i).descriptor = dk_gradient_descriptor( digits_training(i).image, location, scaling);
end
enddk_search_digit
function [searchDigitLocation,numbOfDigitFound] = dk_search_digit(binaryImageWithoutBackground, SquareBoxScaleUp)
connected = 4 ;
[legnthPatchForEachDigit,numbOfDigitFound] = bwlabel(binaryImageWithoutBackground, connected);
searchDigitLocation = zeros(numbOfDigitFound, 3);
for kk = 1 : numbOfDigitFound
[rowVector, colVector] = find(legnthPatchForEachDigit == kk);
fitWithSquareBox = max([max(rowVector) - min(rowVector), max(colVector) - min(colVector)]);
searchDigitLocation(kk, :) =round( [((max(colVector) + min(colVector))/2),((max(rowVector) + min(rowVector))/2),(SquareBoxScaleUp * fitWithSquareBox)] ) ;
end
enddk_TestDigitsArray
function TestDigits = dk_TestDigitsArray(searchDigitLocation, numbOfDigitFound, actualLabelsInOrder)
TestDigits = struct('location', [], 'scaling', [], 'label', []);
for kk = 1 : numbOfDigitFound
TestDigits(kk).location = searchDigitLocation(kk, 1:2);
TestDigits(kk).scaling = searchDigitLocation(kk, 3);
TestDigits(kk).label = actualLabelsInOrder(kk);
end
end