Use R to complete the following question. Provide the instructor with the output from your code as either screenshots pasted in Word, or as output generated in an HTML document. Submit both your code and output in Brightspace. Make sure all textual explanations match the output that you provide the instructor.
credit <- read.csv("/Users/ryannlaky/Documents/University of Indianapolis/MSDA 622/Exams/Exam 3/credit_default.csv")
str(credit)
## 'data.frame': 12000 obs. of 24 variables:
## $ LIMIT_BAL : int 290000 20000 280000 280000 20000 50000 20000 30000 200000 110000 ...
## $ SEX : int 2 1 1 2 1 1 1 2 2 2 ...
## $ EDUCATION : int 2 2 1 1 2 1 2 1 1 1 ...
## $ MARRIAGE : int 2 2 2 2 2 2 2 2 1 2 ...
## $ AGE : int 26 51 29 47 24 26 23 25 38 29 ...
## $ PAY_0 : int 0 -1 -2 0 0 -2 1 1 -2 0 ...
## $ PAY_2 : int 0 -1 -2 0 0 -2 2 2 -2 0 ...
## $ PAY_3 : int 0 -2 -2 0 0 -2 2 2 -2 2 ...
## $ PAY_4 : int 0 -1 -2 0 0 -2 2 2 -2 0 ...
## $ PAY_5 : int -1 -1 -2 0 0 -2 2 0 -2 0 ...
## $ PAY_6 : int -1 -2 -2 0 0 -2 0 0 -2 0 ...
## $ BILL_AMT1 : int 18125 780 10660 269124 17924 2411 16332 29628 5625 107195 ...
## $ BILL_AMT2 : int 20807 0 5123 266163 18475 3059 18111 30453 12125 109443 ...
## $ BILL_AMT3 : int 99860 -1500 8467 215177 19539 2333 18325 30082 13300 106637 ...
## $ BILL_AMT4 : int 100000 780 2510 184270 19396 1800 18391 28933 3450 106665 ...
## $ BILL_AMT5 : int 3015 0 591 130954 11643 1620 14235 25255 13880 92417 ...
## $ BILL_AMT6 : int 23473 0 14994 92215 11578 0 18955 25344 5147 90730 ...
## $ PAY_AMT1 : int 3000 0 5123 11268 1400 3068 2500 1600 12125 7845 ...
## $ PAY_AMT2 : int 80000 0 8467 8196 1380 2440 1000 1000 13416 4000 ...
## $ PAY_AMT3 : int 3000 2280 2510 6281 1181 1807 600 7 3450 4000 ...
## $ PAY_AMT4 : int 3015 0 591 4403 1000 2204 0 880 13880 3500 ...
## $ PAY_AMT5 : int 23473 0 14994 3532 500 0 5000 3028 5147 9500 ...
## $ PAY_AMT6 : int 1148 0 5000 3510 500 0 1300 0 1050 9600 ...
## $ default.payment.next.month: int 0 0 0 0 0 0 1 0 0 0 ...
colSums(is.na(credit))
## LIMIT_BAL SEX
## 0 0
## EDUCATION MARRIAGE
## 0 0
## AGE PAY_0
## 0 0
## PAY_2 PAY_3
## 0 0
## PAY_4 PAY_5
## 0 0
## PAY_6 BILL_AMT1
## 0 0
## BILL_AMT2 BILL_AMT3
## 0 0
## BILL_AMT4 BILL_AMT5
## 0 0
## BILL_AMT6 PAY_AMT1
## 0 0
## PAY_AMT2 PAY_AMT3
## 0 0
## PAY_AMT4 PAY_AMT5
## 0 0
## PAY_AMT6 default.payment.next.month
## 0 0
Above is the basic information regarding the variables in the data
set credit_default.csv with a count of missing values by
column for the set. According to the above, no columns contain missing
variables and therefore do not require imputation.
set.seed(12345)
index <- sample(nrow(credit), nrow(credit)*0.50)
train_credit <- credit[index,]
test_credit <- credit[-index,]
#install.packages("neuralnet") #package for neural network
library(neuralnet) #library for neural network
f <- as.formula("default.payment.next.month ~ .")
nn <- neuralnet(f, data=train_credit, hidden=c(2), algorithm = 'rprop+', linear.output=F)
nn$act.fct
## function (x)
## {
## 1/(1 + exp(-x))
## }
## <bytecode: 0x1084e2930>
## <environment: 0x1084e5ca8>
## attr(,"type")
## [1] "logistic"
#install.packages('devtools')
library(devtools)
source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r')
plot.nnet(nn)
plot.nnet(nn, wts.only=T)
## $`hidden 1 1`
## [1] -0.8464339 0.1667109 1.9543894 -0.9631898 -0.8949507 0.4621312
## [7] -0.2487056 1.5719656 -0.2007033 -1.3815621 -2.1139621 0.1472597
## [13] 0.9725410 0.2153724 1.5167749 0.1237414 -0.4343895 2.1319689
## [19] 0.7489812 1.4677664 -0.4967066 -0.5329329 0.6916307 1.8711582
##
## $`hidden 1 2`
## [1] 0.003770662 -2.412620907 -0.329336046 0.273127921 -1.597994834
## [6] 2.430207507 -1.100087496 -1.462861321 2.172408094 -0.182671753
## [11] 2.431433088 -0.520272212 -3.197648683 -1.317962156 0.234524757
## [16] 0.291192184 0.412659307 1.233155253 -0.341054007 0.132754064
## [21] -0.040643178 -0.363311795 -0.433116455 0.682606257
##
## $`out 1`
## [1] -1.1525573 -0.1343418 0.5937898
#plot(nn)
Above is the neural network display plot from the neural network
model developed. While the function plot(nn) did not
produce any graph, the plot.nnet() provides the plot with
the weights listed below.
#In-Sample Fit Performance
pcut_nn <- 0.5
prob_nn_in <- predict(nn, train_credit, type = "response")
pred_nn_in <- (prob_nn_in >= pcut_nn)*1
table(train_credit$default.payment.next.month, pred_nn_in, dnn = c("Observed", "Predicted"))
## Predicted
## Observed 0
## 0 4701
## 1 1299
#Out-of-Sample Fit Performance
prob_nn_out <- predict(nn, test_credit, type = "response")
pred_nn_out <- (prob_nn_out >= pcut_nn)*1
table(test_credit$default.payment.next.month, pred_nn_out, dnn = c("Observed", "Predicted"))
## Predicted
## Observed 0
## 0 4667
## 1 1333
Above, 4701 observations were correctly predicted for in-sample fit performance and 4667 observations were correctly predicted for out-of-sample fit performance. 1299 observations were incorrectly predicted for in-sample fit performance and 1333 observations were incorrectly predicted for out-of-sample fit performance.
Screenshot: