Use R to complete the following question. Provide the instructor with the output from your code as either screenshots pasted in Word, or as output generated in an HTML document. Submit both your code and output in Brightspace. Make sure all textual explanations match the output that you provide the instructor.
1. Load the credit_default dataset into R
credit_default_scaled <- credit_default <- read.csv("/Users/kamriefoster/Downloads/credit_default.csv")
a. Develop a neural network model with one hidden layer having 2 neurons. In developing this model, use 50% of the rows as your training set and the remaining 50% as your testing set. Use the binary variable default.payment.next.month as your target variable, and use all other columns of data as your covariates. Be sure to develop the neural network in such a way that your target is treated as binary.
First, start by normalizing the data.
maxs <- apply(credit_default[, -24], 2, max)
mins <- apply(credit_default[, -24], 2, min)
credit_default_scaled[, -24] <- as.data.frame(scale(credit_default[,-24], center = mins, scale = maxs - mins))
#summary(credit_default_scaled)
Now generate the training and testing sets.
set.seed(1234)
index <- sample(nrow(credit_default_scaled), nrow(credit_default_scaled)*0.50)
credit_default_train <- credit_default_scaled[index,]
credit_default_test <- credit_default_scaled[-index,]
Generating the neural network model.
library(neuralnet)
#creating a neural network consisting of one hidden layer with 2 neurons, linear.output needs to be false for classification problems where the response is categorical
credit_default_nn <- neuralnet(default.payment.next.month~ ., data=credit_default_train, hidden=c(2), algorithm = 'rprop+', linear.output = F)
plot(credit_default_nn)
A screenshot of the plot has been included below:
c. Use your neural network model to develop predictions for the data in your testing set. When making your predictions, use the cut-off probability of 0.5, and display a confusion matrix for your predictions. How many rows of testing data were misclassified by the model? (Note: Be sure to upload a screenshot of the confusion matrix that you produce.)
First start by defining the cut-off probability.
pcut_nn <- 1/2
Now develop predictions for the testing set.
prob_nn_out <- predict(credit_default_nn, credit_default_test, type = "response")
pred_nn_out <- (prob_nn_out >= pcut_nn)*1
#print the confusion matrix
table(credit_default_test$default.payment.next.month, pred_nn_out, dnn = c("Observed", "Predicted"))
## Predicted
## Observed 0 1
## 0 4430 229
## 1 873 468
A screenshot of the confusion matrix can be seen below:
The number of rows misclassified by the neural network was 1,102 (represented from 873+229).