Laky_Exam 3

Part 1: Problem to be Completed in R

Use R to complete the following question. Provide the instructor with the output from your code as either screenshots pasted in Word, or as output generated in an HTML document. Submit both your code and output in Brightspace. Make sure all textual explanations match the output that you provide the instructor.

Load the credit_default dataset into R.

credit <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 622/Exams/Exam 3/credit_default.csv")
str(credit)

## 'data.frame':    12000 obs. of  24 variables:
##  $ LIMIT_BAL                 : int  290000 20000 280000 280000 20000 50000 20000 30000 200000 110000 ...
##  $ SEX                       : int  2 1 1 2 1 1 1 2 2 2 ...
##  $ EDUCATION                 : int  2 2 1 1 2 1 2 1 1 1 ...
##  $ MARRIAGE                  : int  2 2 2 2 2 2 2 2 1 2 ...
##  $ AGE                       : int  26 51 29 47 24 26 23 25 38 29 ...
##  $ PAY_0                     : int  0 -1 -2 0 0 -2 1 1 -2 0 ...
##  $ PAY_2                     : int  0 -1 -2 0 0 -2 2 2 -2 0 ...
##  $ PAY_3                     : int  0 -2 -2 0 0 -2 2 2 -2 2 ...
##  $ PAY_4                     : int  0 -1 -2 0 0 -2 2 2 -2 0 ...
##  $ PAY_5                     : int  -1 -1 -2 0 0 -2 2 0 -2 0 ...
##  $ PAY_6                     : int  -1 -2 -2 0 0 -2 0 0 -2 0 ...
##  $ BILL_AMT1                 : int  18125 780 10660 269124 17924 2411 16332 29628 5625 107195 ...
##  $ BILL_AMT2                 : int  20807 0 5123 266163 18475 3059 18111 30453 12125 109443 ...
##  $ BILL_AMT3                 : int  99860 -1500 8467 215177 19539 2333 18325 30082 13300 106637 ...
##  $ BILL_AMT4                 : int  100000 780 2510 184270 19396 1800 18391 28933 3450 106665 ...
##  $ BILL_AMT5                 : int  3015 0 591 130954 11643 1620 14235 25255 13880 92417 ...
##  $ BILL_AMT6                 : int  23473 0 14994 92215 11578 0 18955 25344 5147 90730 ...
##  $ PAY_AMT1                  : int  3000 0 5123 11268 1400 3068 2500 1600 12125 7845 ...
##  $ PAY_AMT2                  : int  80000 0 8467 8196 1380 2440 1000 1000 13416 4000 ...
##  $ PAY_AMT3                  : int  3000 2280 2510 6281 1181 1807 600 7 3450 4000 ...
##  $ PAY_AMT4                  : int  3015 0 591 4403 1000 2204 0 880 13880 3500 ...
##  $ PAY_AMT5                  : int  23473 0 14994 3532 500 0 5000 3028 5147 9500 ...
##  $ PAY_AMT6                  : int  1148 0 5000 3510 500 0 1300 0 1050 9600 ...
##  $ default.payment.next.month: int  0 0 0 0 0 0 1 0 0 0 ...

colSums(is.na(credit))

##                  LIMIT_BAL                        SEX 
##                          0                          0 
##                  EDUCATION                   MARRIAGE 
##                          0                          0 
##                        AGE                      PAY_0 
##                          0                          0 
##                      PAY_2                      PAY_3 
##                          0                          0 
##                      PAY_4                      PAY_5 
##                          0                          0 
##                      PAY_6                  BILL_AMT1 
##                          0                          0 
##                  BILL_AMT2                  BILL_AMT3 
##                          0                          0 
##                  BILL_AMT4                  BILL_AMT5 
##                          0                          0 
##                  BILL_AMT6                   PAY_AMT1 
##                          0                          0 
##                   PAY_AMT2                   PAY_AMT3 
##                          0                          0 
##                   PAY_AMT4                   PAY_AMT5 
##                          0                          0 
##                   PAY_AMT6 default.payment.next.month 
##                          0                          0

Above is the basic information regarding the variables in the data set credit_default.csv with a count of missing values by column for the set. According to the above, no columns contain missing variables and therefore do not require imputation.

Develop a neural network model with one hidden layer having 2 neurons. In developing this model, use 50% of the rows as your training set and the remaining 50% as your testing set. Use the binary variable default.payment.next.month as your target variable, and use all other columns of data as your covariates. Be sure to develop the neural network in such a way that your target is treated as binary.

set.seed(12345)
index <- sample(nrow(credit), nrow(credit)*0.50)
train_credit <- credit[index,]
test_credit <- credit[-index,]

#install.packages("neuralnet") #package for neural network
library(neuralnet) #library for neural network
f <- as.formula("default.payment.next.month ~ .")
nn <- neuralnet(f, data=train_credit, hidden=c(2), algorithm = 'rprop+', linear.output=F)
nn$act.fct

## function (x) 
## {
##     1/(1 + exp(-x))
## }
## <bytecode: 0x1084e2930>
## <environment: 0x1084e5ca8>
## attr(,"type")
## [1] "logistic"

Create a plot of your neural network displaying all weights.

#install.packages('devtools')
library(devtools)
source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r')
plot.nnet(nn)

plot.nnet(nn, wts.only=T)

## $`hidden 1 1`
##  [1] -0.8464339  0.1667109  1.9543894 -0.9631898 -0.8949507  0.4621312
##  [7] -0.2487056  1.5719656 -0.2007033 -1.3815621 -2.1139621  0.1472597
## [13]  0.9725410  0.2153724  1.5167749  0.1237414 -0.4343895  2.1319689
## [19]  0.7489812  1.4677664 -0.4967066 -0.5329329  0.6916307  1.8711582
## 
## $`hidden 1 2`
##  [1]  0.003770662 -2.412620907 -0.329336046  0.273127921 -1.597994834
##  [6]  2.430207507 -1.100087496 -1.462861321  2.172408094 -0.182671753
## [11]  2.431433088 -0.520272212 -3.197648683 -1.317962156  0.234524757
## [16]  0.291192184  0.412659307  1.233155253 -0.341054007  0.132754064
## [21] -0.040643178 -0.363311795 -0.433116455  0.682606257
## 
## $`out 1`
## [1] -1.1525573 -0.1343418  0.5937898

#plot(nn)

Above is the neural network display plot from the neural network model developed. While the function plot(nn) did not produce any graph, the plot.nnet() provides the plot with the weights listed below.

Use your neural network model to develop predictions for the data in your testing set. When making your predictions, use the cut-off probability of 0.5, and display a confusion matrix for your predictions. How many rows of testing data were misclassified by the model? (Note: Be sure to upload a screenshot of the confusion matrix that you produce.)

#In-Sample Fit Performance
pcut_nn <- 0.5
prob_nn_in <- predict(nn, train_credit, type = "response")
pred_nn_in <- (prob_nn_in >= pcut_nn)*1
table(train_credit$default.payment.next.month, pred_nn_in, dnn = c("Observed", "Predicted"))

##         Predicted
## Observed    0
##        0 4701
##        1 1299

#Out-of-Sample Fit Performance
prob_nn_out <- predict(nn, test_credit, type = "response")
pred_nn_out <- (prob_nn_out >= pcut_nn)*1
table(test_credit$default.payment.next.month, pred_nn_out, dnn = c("Observed", "Predicted"))

##         Predicted
## Observed    0
##        0 4667
##        1 1333

Above, 4701 observations were correctly predicted for in-sample fit performance and 4667 observations were correctly predicted for out-of-sample fit performance. 1299 observations were incorrectly predicted for in-sample fit performance and 1333 observations were incorrectly predicted for out-of-sample fit performance.

Screenshot:

Laky_Exam 3

Laky, Ryann

2023-04-29

Part 1: Problem to be Completed in R