Deep Learning - Classifying Traffic sign boards using Deep Neural Networks

Traffic signs are something that we all know of. Belgiam is a country where traffic signs are found to be in dutch and french languages. Well, it is not a matter of concern for this project as we will classify 4575 images of 62 different types of Traffic sign boards found on Belgian roads.

Belgian traffic signs have 6 major categories. And as of 31st Jan, 2017 30000 different traffic signs where reomved from belgian roads. The extreme presence of different types of signs on Belgian roads has been an overwhelming sign and has lead to discussions in Belgium as well as European union.

To provide solution to this overwhelming issue, the goal of this project is to develop a Deep Neural Network which classifies the image into their respecive category. This image classification system can be then used in automobiles which with a front camera can help alert the driver with all the intricacies that he may encounter on road.

Firstly, we use Tensorflow's inbuilt functions to develop a neural network. We then test this model on new images.

Secondly, we would try to hard code a Deep Neural Network with 6 layers and 2400 Nodes to obtain a model which will yield train accuracy of 100% and performs extremely well on the Test data.

The data for the following project has been downloaded from http://btsd.ethz.ch/shareddata

Importing important libraries

In [1]:
import os
import random
import time
import numpy as np
import matplotlib.pyplot as plt
from skimage import data, io, filters, transform
from skimage.color import rgb2gray
import tensorflow as tf
%matplotlib inline

Loading the data set

In [2]:
def load_data(data_directory):
    directories = [d for d in os.listdir(data_directory) 
                   if os.path.isdir(os.path.join(data_directory, d))]
    labels = []
    images = []
    for d in directories:
        label_directory = os.path.join(data_directory, d)
        file_names = [os.path.join(label_directory, f) 
                      for f in os.listdir(label_directory) 
                      if f.endswith(".ppm")]
        for f in file_names:
            images.append(data.imread(f))
            labels.append(int(d))
    return images, labels

ROOT_PATH = "F:/Belgian traffic - Deep learning/"
train_data_directory = os.path.join(ROOT_PATH, "TrafficSigns/Training")
test_data_directory = os.path.join(ROOT_PATH, "TrafficSigns/Testing")

images, labels = load_data(train_data_directory)
test_images, test_labels = load_data(test_data_directory)

Data Exploration

In [3]:
plt.figure(figsize = (10, 6.5))
plt.hist(labels, 62, histtype = 'step');

In the above plot, we try to display a histogram of the labels in our training data.We can say that there are maximum number of images with labels 22 and 32.

The below graph plots the 62 classes rangeing from 0 to 61 from our data set. We find that images with class 22 and 32 are maximum as compared to other images. This is also supported by the histogram which has large spikes on numbers 22 and 32.

In [4]:
import pandas as pd
uni_labels = set(labels)
i = 1
plt.figure(figsize = (25,25))
for label in uni_labels:
    image = images[labels.index(label)]
    plt.subplot(8, 8, i)
    plt.axis('off')
    i += 1
    plt.title('label {0} ({1})'.format(label, labels.count(label)))
    plt.imshow(image)

plt.show()

Now, lets explore some images from our training data. We would randomly select images from our training stack and print the shape, min and maximum values of these images.

In [5]:
plt.figure(figsize = (8,6))
ts = [321, 666, 808,3567]
for i in range(len(ts)):
    plt.subplot(4,1,i+1)
    plt.axis('off')
    plt.imshow(images[ts[i]])
    plt.show()
    print("shape: {0}, min: {1}, max: {2}".format(images[ts[i]].shape, 
                                                  images[ts[i]].min(), 
                                                  images[ts[i]].max()))
shape: (151, 175, 3), min: 0, max: 255
shape: (180, 58, 3), min: 6, max: 255
shape: (98, 109, 3), min: 7, max: 255
shape: (183, 98, 3), min: 9, max: 255

From the above four plots, we find that

  • The size of the images is not same
  • The minimum and maximum values of the plots are also different.

These obseravtions lead us to a conclusion that all the images should be rescalled before we feed them into our neural network as it would make the training process much easier and more accurate.

Also, the minimum and maximum values need to be tuned into a same range or need to be normalized.

Feature Extraction

We will rescale our images using the ski-image package's transform function to 50x50 pixels so that we have a uniform dimension for our images.

In [3]:
# Resize images
images50 = [transform.resize(image, (50, 50)) for image in images]
test_images50 = [transform.resize(image, (50, 50)) for image in test_images]
f:\python36\lib\site-packages\skimage\transform\_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "
In [7]:
plt.subplot(1,2,1)
plt.imshow(images[256])
plt.title('Original Image')
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(images50[256])
plt.axis('off')
plt.title('Rescaled Image')
Out[7]:
Text(0.5,1,'Rescaled Image')

The below plot tells that all the images have been rescaled to 50x50 pixels with 3 color pannels of RGB. This way our feature extraction would help model our neural network with high accuracy.

In [8]:
ts = [321, 666, 808,3567]
for i in range(len(ts)):
    plt.subplot(4,1,i+1)
    plt.axis('off')
    plt.imshow(images50[ts[i]])
    plt.show()
    print("shape: {0}, min: {1}, max: {2}".format(images50[ts[i]].shape, 
                                                  images50[ts[i]].min(), 
                                                  images50[ts[i]].max()))
shape: (50, 50, 3), min: 0.0029411764705885696, max: 1.0
shape: (50, 50, 3), min: 0.04669803921568689, max: 1.0
shape: (50, 50, 3), min: 0.03564705882352922, max: 0.9990980392156864
shape: (50, 50, 3), min: 0.04449411764705905, max: 1.0

Model Engineering - Building the Neural Network

After extracting our features and tuning the dimensions of our images to one uniform dimension we can say that we are ready to perform Deep learning on our data set.

We would use Tensorflow's built in functions to create a simple neural network.

The Neural network's parameters and hyperparameters are as defined in the below cell

In [124]:
x = tf.placeholder(dtype = tf.float32, shape = [None, 50, 50, 3])
y = tf.placeholder(dtype = tf.int32, shape = [None])
images_flat = tf.contrib.layers.flatten(x)                         ##similar to np.array().reshape(-1)
logits = tf.contrib.layers.fully_connected(images_flat, 62, tf.nn.relu) ## creates weights and bias matrix (hidden to user). Gives probabilities of occurance for the given class variable
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y, logits = logits)) ## COMPUTES the loss by softmax cross entropy method. Each image can have one class and not more than that. This calculates the probability error in discrete classification
train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
correct_pred = tf.argmax(logits, 1)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

print("images_flat: ", images_flat)
print("logits: ", logits)
print("loss: ", loss)
print("predicted_labels: ", correct_pred)
images_flat:  Tensor("Flatten_16/flatten/Reshape:0", shape=(?, 7500), dtype=float32)
logits:  Tensor("fully_connected/Relu:0", shape=(?, 62), dtype=float32)
loss:  Tensor("Mean_33:0", shape=(), dtype=float32)
predicted_labels:  Tensor("ArgMax_222:0", shape=(?,), dtype=int64)

Running the Neural Network on our Train data with 501 iterations

In [125]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
start = time.clock()
print('Model Initialized')
print('-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-')
for i in range(601):
    _, accuracy_val = sess.run([train_op, accuracy], feed_dict={x: images50, y: labels})
    lossi = sess.run(loss, feed_dict={x: images50, y: labels})
    if i % 100 == 0:
        print("Cost after Epoch {:d}       : {:g}".format(i, lossi))
        predicted = np.array(sess.run(correct_pred, feed_dict={x: images50}))
        train_accuracy = np.equal(predicted, labels).mean()
        print('Accuracy on TRAINING data:', round(train_accuracy,3) * 100,"%")
        predicted_test = sess.run([correct_pred], feed_dict={x: test_images50})
        print('Accuracy on TEST     data:', round(np.equal(predicted_test[0], test_labels).mean() * 100, 3), "%")
        print('-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-')
    end = time.clock()
print('\nTotal time Taken: {:.3f} mins'.format(round((end- start)/60, 2)))
print('-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-')
Model Initialized
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after Epoch 0       : 3.78652
Accuracy on TRAINING data: 7.6 %
Accuracy on TEST     data: 5.952 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after Epoch 100       : 1.72731
Accuracy on TRAINING data: 66.2 %
Accuracy on TEST     data: 59.206 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after Epoch 200       : 1.57305
Accuracy on TRAINING data: 66.9 %
Accuracy on TEST     data: 59.643 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after Epoch 300       : 1.50365
Accuracy on TRAINING data: 67.10000000000001 %
Accuracy on TEST     data: 59.802 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after Epoch 400       : 1.46397
Accuracy on TRAINING data: 67.2 %
Accuracy on TEST     data: 59.881 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after Epoch 500       : 1.43835
Accuracy on TRAINING data: 67.2 %
Accuracy on TEST     data: 59.802 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after Epoch 600       : 1.42047
Accuracy on TRAINING data: 67.2 %
Accuracy on TEST     data: 59.762 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Total time Taken: 4.760 mins
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

We can see that the Neural Network is not performing great as it just yield's 59.76 % accuracy on our Test data. This model can not be taken forward as the accuracy is way too low to draw any conclusions for further research

Visualizing the Predictions

In [11]:
plt.figure(figsize = (15,15))
sample_test = random.sample(range(len(test_images50)), 10)
sample_image = [test_images50[i] for i in sample_test]
sample_label = [test_labels[i] for i in sample_test]
sample_predicted = [predicted_test[0][i] for i in sample_test]
for i in range(len(sample_image)):
    act = sample_label[i]
    pred = sample_predicted[i]
    plt.subplot(5,2,i+1)
    plt.axis('off')
    plt.imshow(sample_image[i])
    color = 'green' if act == pred else 'red'
    plt.text(60,10,"Actual      : {0} \nPredicted : {1}".format(act, pred), color = color)

The above graph illustrates that out of 10 randomly sampled images from the test data we where not able to classify more than 4 image.

This brings us motivation to develop a Deep Neural Network with hidden layers.

Let's hard code the model with Tensorflow.

Creating a 6 Layer Neural Network

For faster execution of the model we shall convert the RGB images into Grayscale and then feed it into our Neural Network.

In [4]:
images_grey = rgb2gray(np.array(images50))
test_images_grey = rgb2gray(np.array(test_images50))
##Data Augmentation part
##sess = tf.Session()
##sess.run(tf.global_variables_initializer())
##extra_train_images = sess.run(tf.image.flip_left_right(images_grey))
##new_train_images = np.vstack((images_grey, extra_train_images))
##new_train_labels = list(np.hstack((labels, labels)))

Define the parameters and hyperparameters of our neural network

We will have 1280 neurons in the first layer and then will keep on decreasing the neurons by half in every successive layer untill we have 80 neurons in the 5th hidden layer.

In [113]:
n_neurons_in_h1 = 1280
n_neurons_in_h2 = 640
n_neurons_in_h3 = 320
n_neurons_in_h4 = 160
n_neurons_in_h5 = 1280
learning_rate = 0.001
n_features = 2500
n_classes = 62
In [114]:
X = tf.placeholder(dtype = tf.float32, shape = [None, 50, 50])
Y = tf.placeholder(dtype = tf.int32, shape = [None])
images_flat_a = tf.contrib.layers.flatten(X)
test_eval = tf.constant(test_images_grey, dtype='float32')
images_flat_test = tf.contrib.layers.flatten(test_eval)
In [115]:
W1 = tf.Variable(tf.truncated_normal([n_features, n_neurons_in_h1], stddev=0.1), name='weights1')
b1 = tf.Variable(tf.random_normal([n_neurons_in_h1]), name='biases1')
y1 = tf.nn.relu(tf.add(tf.matmul(images_flat_a, W1), b1), name='activationLayer1')

W2 = tf.Variable(tf.truncated_normal([n_neurons_in_h1, n_neurons_in_h2], stddev=0.1),name='weights2')
b2 = tf.Variable(tf.random_normal([n_neurons_in_h2]),name='biases2')
y2 = tf.nn.relu(tf.add(tf.matmul(y1, W2), b2),name='activationLayer2')

W3 = tf.Variable(tf.truncated_normal([n_neurons_in_h2, n_neurons_in_h3], stddev=0.1),name='weights3')
b3 = tf.Variable(tf.random_normal([n_neurons_in_h3]),name='biases3')
y3 = tf.nn.relu(tf.add(tf.matmul(y2, W3), b3),name='activationLayer3')

W4 = tf.Variable(tf.truncated_normal([n_neurons_in_h3, n_neurons_in_h4], stddev=0.1),name='weights4')
b4 = tf.Variable(tf.random_normal([n_neurons_in_h4]),name='biases4')
y4 = tf.nn.relu(tf.add(tf.matmul(y3, W4), b4),name='activationLayer4')

W5 = tf.Variable(tf.truncated_normal([n_neurons_in_h4, n_neurons_in_h5], stddev=0.1),name='weights4')
b5 = tf.Variable(tf.random_normal([n_neurons_in_h5]),name='biases4')
y5 = tf.nn.relu(tf.add(tf.matmul(y4, W5), b5),name='activationLayer4')

Wo = tf.Variable(tf.random_normal([n_neurons_in_h5, n_classes], stddev=0.1), name='weightsOut')
bo = tf.Variable(tf.random_normal([n_classes]), name='biasesOut')
y_ = tf.nn.log_softmax(tf.add(tf.matmul(y5, Wo), bo))

Regularization

In [116]:
reg = tf.nn.l2_loss(W1) +tf.nn.l2_loss(W2) + tf.nn.l2_loss(W3) + tf.nn.l2_loss(W4) + tf.nn.l2_loss(W5) + tf.nn.l2_loss(Wo)

Define Cross entropy and Optimization function

In [117]:
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = Y, logits = y_))
cross_entropy = tf.add(cross_entropy, 0.01 * reg) ##beta = 0.01
train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cross_entropy)

Predicting on the train set

In [118]:
correct_prediction = tf.argmax(y_, 1)
accuracy_a = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Predicting on the test set

In [119]:
y1_test = tf.nn.relu(tf.add(tf.matmul(images_flat_test, W1), b1), name='activationLayer1')
y2_test = tf.nn.relu(tf.add(tf.matmul(y1_test, W2), b2),name='activationLayer2')
y3_test = tf.nn.relu(tf.add(tf.matmul(y2_test, W3), b3),name='activationLayer3')
y4_test = tf.nn.relu(tf.add(tf.matmul(y3_test, W4), b4),name='activationLayer4')
y5_test = tf.nn.relu(tf.add(tf.matmul(y4_test, W5), b5),name='activationLayer4')
y_test = tf.nn.log_softmax(tf.add(tf.matmul(y5_test, Wo), bo))
In [120]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())
start = time.clock()
print('Model Initialized')
print('-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-')
for i in range(601):
    _, accuracy_val_a = sess.run([train_step, accuracy_a], feed_dict={X: images_grey, Y: labels})
    lossi_a = sess.run(cross_entropy, feed_dict={X: images_grey, Y: labels})
    if i % 100 == 0:
        print("Cost after  Epoch {:d}    : {:g}".format(i, lossi_a))
        train_accuracy_a = round(sess.run(tf.equal(correct_prediction, labels), feed_dict={X:images_grey}).mean() * 100, 3)
        print('Accuracy on TRAINING data:', train_accuracy_a,"%")
        test_acc = sess.run([tf.equal(tf.argmax(y_test.eval(session=sess), 1), test_labels)], feed_dict={X: test_images_grey})[0].mean()
        print('Accuracy on TEST     data:', round(test_acc * 100, 3), "%")
        print('-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-')
    end = time.clock()
print('\nTotal time Taken: {:.3f} mins'.format(round((end- start)/60, 2)), '\n')
print('-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-')
Model Initialized
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after  Epoch 0    : 205.509
Accuracy on TRAINING data: 0.568 %
Accuracy on TEST     data: 0.198 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after  Epoch 100    : 109.38
Accuracy on TRAINING data: 92.962 %
Accuracy on TEST     data: 80.317 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after  Epoch 200    : 74.4544
Accuracy on TRAINING data: 98.295 %
Accuracy on TEST     data: 82.698 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after  Epoch 300    : 52.9307
Accuracy on TRAINING data: 99.519 %
Accuracy on TEST     data: 84.048 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after  Epoch 400    : 38.1862
Accuracy on TRAINING data: 99.628 %
Accuracy on TEST     data: 84.603 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after  Epoch 500    : 28.8228
Accuracy on TRAINING data: 87.65 %
Accuracy on TEST     data: 81.111 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
Cost after  Epoch 600    : 21.4783
Accuracy on TRAINING data: 97.967 %
Accuracy on TEST     data: 85.198 %
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Total time Taken: 22.920 mins 

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
In [122]:
plt.figure(figsize = (15,15))
sample_test = random.sample(range(len(test_images50)), 10)
sample_image = [test_images50[i] for i in sample_test]
sample_label = [test_labels[i] for i in sample_test]
sample_predicted = [sess.run(tf.argmax(y_test.eval(session=sess),1))[i] for i in sample_test]
for i in range(len(sample_image)):
    act = sample_label[i]
    pred = sample_predicted[i]
    plt.subplot(5,2,i+1)
    plt.axis('off')
    plt.imshow(sample_image[i])
    color = 'green' if act == pred else 'red'
    plt.text(60,10,"Actual      : {0} \nPredicted : {1}".format(act, pred), color = color)

From the above graph, we can say that the new 6 layer deep neural network does a phenomenal job by predicting with 85% accuracy.

Conclusion

  • We obtain a model accuracy of 97.96% on the training set and 85.20% on the test set.

  • Data Augmentation was implemented on the above data set but resulted into more time consuming model training and did not increase the accuracy by major folds. Thus, was not taken into consideration.

  • Converting the images into grayscale helped train the model faster as comapared to RGB images.

  • Building a Deep neural network helped obtain greater accuracy.