Homework #3, Question 5 (Balog SUID# 05849374, Jaiswal SUID# 05961816, Wu SUID# 05173124)

a) Fit on the training set one hidden layer neural networks with 1, 2,…, 10 hidden units and different sets of starting values for the predictors

library(nnet)
library(caret)

## Loading required package: lattice
## Loading required package: ggplot2


rm(list = ls())

# Read in 'Spam Data' and create factors variables in dataframe
spam.data <- read.table("~/Dropbox/Data Science Courses/STATS315B/Course Material/Data/Spam Data/Spam_Data.txt", 
    header = FALSE, sep = ",")
spam.train <- read.table("~/Dropbox/Data Science Courses/STATS315B/Course Material/Data/Spam Data/Spam_Train.txt", 
    header = FALSE, sep = ",")

spam.test <- read.table("~/Dropbox/Data Science Courses/STATS315B/Course Material/Data/Spam Data/Spam.Test.txt", 
    header = FALSE, sep = ",")

# Names for predictors and response
spam_col_names <- c("make", "address", "all", "3d", "our", "over", "remove", 
    "internet", "order", "mail", "receive", "will", "people", "report", "addresses", 
    "free", "business", "email", "you", "credit", "your", "font", "000", "money", 
    "hp", "hpl", "george", "650", "lab", "labs", "telnet", "857", "data", "415", 
    "85", "technology", "1999", "parts", "pm", "direct", "cs", "meeting", "original", 
    "project", "re", "edu", "table", "conference", ";", "(", "[", "!", "$", 
    "#", "CAPAVE", "CAPMAX", "CAPTOT", "SPAM")
colnames(spam.data) <- spam_col_names
colnames(spam.train) <- spam_col_names
colnames(spam.test) <- spam_col_names

# Scale the X variables:
spam.train[, 1:57] <- scale(spam.train[, 1:57])
spam.test[, 1:57] <- scale(spam.test[, 1:57])

# Train 10 neural nets with 1 to 10 hidden units Initial random parameters
# ([-rang, rang]) set to range of -0.5 to 0.5

set.seed(187)
for (n in 1:10) {
    NN.train <- nnet(factor(SPAM) ~ ., data = spam.train, size = n, maxit = 5000, 
        rang = 0.5, trace = F)

    cat("Misclassificaiton % for model trained with", n, "hidden units:", sum(spam.test$SPAM != 
        predict(NN.train, newdata = spam.test[, 1:57], type = "class"))/nrow(spam.test), 
        "\n")
}

## Misclassificaiton % for model trained with 1 hidden units: 0.08344 
## Misclassificaiton % for model trained with 2 hidden units: 0.0691 
## Misclassificaiton % for model trained with 3 hidden units: 0.07432 
## Misclassificaiton % for model trained with 4 hidden units: 0.06649 
## Misclassificaiton % for model trained with 5 hidden units: 0.05867 
## Misclassificaiton % for model trained with 6 hidden units: 0.05346 
## Misclassificaiton % for model trained with 7 hidden units: 0.06063 
## Misclassificaiton % for model trained with 8 hidden units: 0.08801 
## Misclassificaiton % for model trained with 9 hidden units: 0.0515 
## Misclassificaiton % for model trained with 10 hidden units: 0.06975

It turns out that the model with 6 hidden layers performed the best in terms of misclassification error (~5.35%).

b) Choose the optimal regularization (weight decay for parameters 0, 0.1, … , 1) for the structural model found above (i.e. with 6 hidden layers).

set.seed(187)
for (n in seq(0, 1, 0.1)) {
    NN.train <- nnet(factor(SPAM) ~ ., data = spam.train, size = 6, maxit = 5000, 
        rang = 0.5, decay = n, trace = F)

    cat("Misclassificaiton % for model trained with", NN.train$n[2], "hidden units and a decay factor of", 
        n, ":", sum(spam.test$SPAM != predict(NN.train, newdata = spam.test[, 
            1:57], type = "class"))/nrow(spam.test), "\n")
}

## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0 : 0.05346 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.1 : 0.04889 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.2 : 0.04954 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.3 : 0.04694 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.4 : 0.04628 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.5 : 0.05606 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.6 : 0.0515 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.7 : 0.0502 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.8 : 0.04954 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 0.9 : 0.05215 
## Misclassificaiton % for model trained with 6 hidden units and a decay factor of 1 : 0.05541

The best model uses 6 hidden layers and 0.4 as the weight decay. An estimation of the misclassification error of the model is ~5.09% which is the average of 10 misclassification rates using different decay factors and random starting parameters.

c) Repeat the previous point requiring this time the proportion of misclassified good emails to be less than 1% by threshold the probability of being a good email at an appropriate value as stated in the problem.

set.seed(187)
for (n in seq(0, 1, 0.1)) {
    NN.train <- nnet(factor(SPAM) ~ ., data = spam.train, size = 6, maxit = 5000, 
        rang = 0.5, decay = n, trace = F)

    threshold = 0.8

    cat("False Negative rate for model trained with", NN.train$n[2], "hidden units and a decay factor of", 
        n, "and a threshold of", threshold, ":", sum(spam.test$SPAM < (predict(NN.train, 
            newdata = spam.test[, 1:57], type = "raw") > threshold))/nrow(spam.test), 
        "\n")
}

## False Negative rate for model trained with 6 hidden units and a decay factor of 0 and a threshold of 0.8 : 0.02542 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.1 and a threshold of 0.8 : 0.01173 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.2 and a threshold of 0.8 : 0.01434 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.3 and a threshold of 0.8 : 0.007823 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.4 and a threshold of 0.8 : 0.01108 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.5 and a threshold of 0.8 : 0.007823 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.6 and a threshold of 0.8 : 0.008475 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.7 and a threshold of 0.8 : 0.007171 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.8 and a threshold of 0.8 : 0.009778 
## False Negative rate for model trained with 6 hidden units and a decay factor of 0.9 and a threshold of 0.8 : 0.01043 
## False Negative rate for model trained with 6 hidden units and a decay factor of 1 and a threshold of 0.8 : 0.006519

By using a threshold probability at 80% (i.e. only classify emails to be a spam if the probability of being a spam is >80%), we were able to hold the false positive rate (misclassified good emails) to be less than 1% for a number of weight decay factors ranging from 0.3 to 1.