1. Decision Tree

We create a decision tree using rpart package which uses Gini index as default impurity measure and check the number of nodes and error rate.

Number of nodes in unpruned tree is 41
Training Error rate is = 34.4 %

2. Pruned tree

We prune the above tree using complexity parameters in the range of 0.001 to 0.1 and see that the accuracy falls monotonically as it increases.

So we choose a value of complexity parameter of 0.03 for our analysis. The number of nodes reduces to 29 and the error rate increases.

Number of nodes in unpruned tree is 29
Training Error rate is = 48.4 %

3. Random Forest

We next classify using random forest with three predictors per node.

Training Error rate is = 0 %

4. Bagging

We do classification by bagging using 100 bootstrap replicates.

Training Error rate is = 9.4 %

5. Boosting

We do Boosing with n trees for n = 1, 2, 3, …, 500 trees with stump as weak classifier. To reduce computation I do it for n = 50, 100, 150,…, 500 and see that there is not much improvement in the training error rates.

Next I do it for n = 1, 2, 3, …, 100 and observe similar results.

So we just use 50 trees for our analysis.

Training Error rate is = 91.2 %

6. Multi layer perceptron

We use perceptron with single hdden layer first with n = 5 nodes.

Training Error rate is = 21.2 %

Then do the same for n = 3 nodes.

Training Error rate is = 58 %

Q1(a) Training Error Rates for all Classifiers

  1. The unpruned tree has error rate of 34% and the pruning only reduces the training error as it overfits the data less.
  2. The random forest has zero training error. This indicates overfitting.
  3. Bagging produces the best result on the training data with low error rate.
  4. Boosting produces the worst result on the training data as we are using only stumps as the weak classifier.
  5. The Multi Layer Peceptron does well on the training data with 5 hidden nodes but with 3 hidden nodes it does poorly. More nodes are expected to improve the error rate as there are 30 classes in this problem.
                  Classifier Training error
1              Unpruned Tree       34.37500
2 Pruned Tree (alpha = 0.03)       48.43750
3              Random Forest        0.00000
4                    Bagging        9.37500
5        Boosting (50 trees)       91.17647
6       MLP (5 hidden nodes)       21.17647
7       MLP (3 hidden nodes)       58.03922

Q2(b) Test Error Rates for the first three Classifiers

We can see that the both trees have poor test rates however the unpruned tree does slightly better on the test set as it is more general and less overfitted than the unpruned tree.

The random forest does better than both trees as expected despite being an overfitted model.

Test Error rate for Unpruned Tree is 50 %
Test Error rate for Pruned Tree (alpha = 0.03) is 57.14286 %
Test Error rate for Random Forest is 22.61905 %

Q3(c) Confusion Matrices on Test Data for the first three Classifiers

Testing Confusion Matrix for unpruned Tree
    
.    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 22 23 24 25 26 27 28 29 30 31 32 33 34
  1  3 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  1  0  0  1  0  0  0  0  1  1  0
  2  0 2 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  0
  3  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  4  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  5  0 0 0 0 3 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  6  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  7  0 0 0 0 0 0 1 0 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0
  8  0 0 0 0 0 0 0 3 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  9  0 0 1 0 0 0 0 0 2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  10 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
  11 0 0 0 0 0 0 0 0 0  0  4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  12 0 0 0 0 0 0 0 0 0  0  0  3  0  1  0  1  0  0  0  0  0  0  0  0  0  0  0  0
  13 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  14 0 0 0 0 0 0 0 0 0  0  0  0  0  2  0  0  0  0  0  0  0  2  0  0  0  0  0  0
  15 0 0 0 0 0 1 0 0 0  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0
  22 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  23 0 0 1 0 0 0 0 0 1  0  0  0  0  0  0  0  3  0  0  0  0  0  3  0  0  0  0  0
  24 0 0 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  2  0  1  1  0  0  0  0  0  0  0
  25 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  26 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0
  27 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  28 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  29 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  30 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3  0  0  0  0
  31 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3  0  0  3
  32 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0
  33 0 0 0 1 0 0 0 0 0  3  0  0  3  0  0  0  0  0  1  0  1  0  0  0  0  0  1  0
  34 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  35 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  1  0
  36 0 0 0 0 0 1 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
    
.    35 36
  1   0  0
  2   0  0
  3   0  0
  4   0  0
  5   0  0
  6   0  0
  7   0  0
  8   0  0
  9   0  0
  10  0  0
  11  0  1
  12  1  0
  13  0  0
  14  0  0
  15  0  0
  22  0  0
  23  0  0
  24  0  0
  25  0  0
  26  0  0
  27  0  0
  28  0  0
  29  0  0
  30  0  0
  31  0  0
  32  0  0
  33  0  0
  34  0  0
  35  2  0
  36  0  1
Testing Confusion Matrix for pruned Tree (alpha = 0.03) 
    
.    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 22 23 24 25 26 27 28 29 30 31 32 33 34
  1  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  2  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  3  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  4  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  5  0 0 0 0 3 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  6  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  7  0 0 0 0 0 0 1 0 0  0  0  3  0  1  0  2  0  0  0  0  0  0  0  0  0  0  0  0
  8  0 0 0 0 0 0 0 3 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  9  0 0 1 0 0 0 0 0 2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  10 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
  11 0 0 0 0 0 0 0 0 0  0  4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  12 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  13 0 0 0 1 0 0 0 0 0  3  0  0  3  0  0  0  0  0  1  2  1  0  0  0  0  0  1  0
  14 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  15 0 0 0 0 0 1 0 0 0  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0
  22 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  23 0 0 1 0 0 0 0 0 1  0  0  0  0  0  0  0  3  0  0  0  0  0  3  0  0  0  0  0
  24 3 0 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  3  0  1  2  0  0  0  0  1  1  0
  25 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  26 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  27 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  28 0 0 0 0 0 0 0 0 0  0  0  0  0  2  0  0  0  0  0  0  0  3  0  0  0  0  1  0
  29 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  30 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3  0  0  0  0
  31 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3  0  0  3
  32 0 2 0 0 0 0 1 0 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  2  0  0
  33 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  34 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  35 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  36 0 0 0 0 0 1 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
    
.    35 36
  1   0  0
  2   0  0
  3   0  0
  4   0  0
  5   0  0
  6   0  0
  7   1  0
  8   0  0
  9   0  0
  10  0  0
  11  0  1
  12  0  0
  13  0  0
  14  0  0
  15  0  0
  22  0  0
  23  0  0
  24  0  0
  25  0  0
  26  0  0
  27  0  0
  28  2  0
  29  0  0
  30  0  0
  31  0  0
  32  0  0
  33  0  0
  34  0  0
  35  0  0
  36  0  1
Testing Confusion Matrix for Random Forest
    
.    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 22 23 24 25 26 27 28 29 30 31 32 33 34
  1  3 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  1  0
  2  0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  3  0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
  4  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  5  0 0 0 0 3 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  6  0 0 0 0 0 2 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  7  0 0 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  8  0 0 0 0 0 0 0 3 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  9  0 0 0 0 0 0 0 0 2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  10 0 0 0 0 0 0 0 0 1  3  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  11 0 0 0 0 0 0 0 0 0  0  4  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  12 0 0 0 0 0 0 0 0 0  0  0  3  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  13 0 0 0 1 0 0 0 0 0  0  0  0  3  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
  14 0 0 0 0 0 0 0 0 0  0  0  0  0  2  0  0  0  0  0  0  0  1  0  0  0  0  1  0
  15 0 0 0 0 0 0 0 0 0  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0
  22 0 0 0 0 0 0 0 0 0  0  0  0  0  1  0  3  0  0  0  0  0  0  0  0  0  0  0  0
  23 0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  3  0  0  0  0  0  0  0  0  0  0  0
  24 0 0 1 1 0 0 0 0 0  0  0  0  0  0  0  0  0  3  0  1  0  0  0  0  0  0  0  0
  25 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0
  26 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0
  27 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0
  28 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0
  29 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  3  0  0  0  0  0
  30 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0
  31 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3  0  0  0
  32 0 1 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
  33 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
  34 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3
  35 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  36 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
    
.    35 36
  1   0  0
  2   0  0
  3   0  0
  4   0  0
  5   0  0
  6   0  0
  7   0  0
  8   0  0
  9   0  0
  10  0  0
  11  0  1
  12  0  0
  13  0  0
  14  0  0
  15  0  0
  22  1  0
  23  0  0
  24  0  0
  25  0  0
  26  0  0
  27  0  0
  28  0  0
  29  0  0
  30  0  0
  31  0  0
  32  0  0
  33  0  0
  34  0  0
  35  2  0
  36  0  1

End