This report is a analysis of 4 major methods for creating predictive models:
Each of these four methods were applied to two datasets representing the two different data set categories:
For comparing the “fit” of the models to determine a best-to-worst performance list the MSE and MSPE were calculated for each model. These metrics were compared only within their respective categories. Each section has a tab labelled “Final Analysis” which contains a table of all the values.A summary of the results is as follows.
model_step_glm
## [1] 1851.647
## [1] 2066.989
## [1] 2828.861
The non-linear relationship of variables with medv can be seen in the following plots:
## [1] 7.991131
## [1] 14.35228
## Warning: package 'neuralnet' was built under R version 3.6.3
##
## Attaching package: 'neuralnet'
## The following object is masked from 'package:dplyr':
##
## compute
| Metric | GLM | TREE | GAM | NN |
|---|---|---|---|---|
| mse | 22.55819 | 14.34562 | 7.991131 | 6.694951 |
| mspe | 26.31746 | 19.70207 | 14.352277 | 15.592189 |
##
## Call:
## glm(formula = response ~ chk_acct + duration + credit_his + purpose +
## saving_acct + present_emp + installment_rate + sex + other_debtor +
## other_install + housing, family = binomial, data = german_credit.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0411 -0.7110 -0.4039 0.7209 2.6735
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.742083 0.870547 2.001 0.045378 *
## chk_acctA12 -0.292478 0.253344 -1.154 0.248309
## chk_acctA13 -1.320577 0.443625 -2.977 0.002913 **
## chk_acctA14 -1.642158 0.272776 -6.020 1.74e-09 ***
## duration 0.048188 0.009069 5.314 1.08e-07 ***
## credit_hisA31 -0.361028 0.650907 -0.555 0.579131
## credit_hisA32 -1.204443 0.515576 -2.336 0.019485 *
## credit_hisA33 -1.353270 0.587232 -2.304 0.021195 *
## credit_hisA34 -1.893126 0.544537 -3.477 0.000508 ***
## purposeA41 -1.503055 0.417294 -3.602 0.000316 ***
## purposeA410 -1.405119 0.816628 -1.721 0.085317 .
## purposeA42 -0.682366 0.296909 -2.298 0.021548 *
## purposeA43 -0.854577 0.291499 -2.932 0.003372 **
## purposeA44 -0.199797 0.829919 -0.241 0.809754
## purposeA45 -1.221517 0.767520 -1.592 0.111494
## purposeA46 0.292384 0.463419 0.631 0.528087
## purposeA48 -1.422238 1.192221 -1.193 0.232896
## purposeA49 -1.079342 0.395266 -2.731 0.006321 **
## saving_acctA62 -0.205578 0.327355 -0.628 0.530005
## saving_acctA63 -0.477306 0.494161 -0.966 0.334099
## saving_acctA64 -1.712172 0.707818 -2.419 0.015566 *
## saving_acctA65 -1.008920 0.303236 -3.327 0.000877 ***
## present_empA72 0.016174 0.442898 0.037 0.970869
## present_empA73 0.122999 0.409909 0.300 0.764129
## present_empA74 -0.585216 0.454267 -1.288 0.197654
## present_empA75 0.043752 0.429115 0.102 0.918790
## installment_rate 0.247650 0.094631 2.617 0.008871 **
## sexA92 -0.339404 0.463878 -0.732 0.464373
## sexA93 -0.654156 0.450692 -1.451 0.146655
## sexA94 -0.188243 0.547651 -0.344 0.731051
## other_debtorA102 0.667249 0.445567 1.498 0.134256
## other_debtorA103 -1.113534 0.502317 -2.217 0.026637 *
## other_installA142 0.387415 0.518702 0.747 0.455128
## other_installA143 -0.708351 0.274655 -2.579 0.009907 **
## housingA152 -0.551443 0.262375 -2.102 0.035576 *
## housingA153 -0.380916 0.393916 -0.967 0.333546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 863.51 on 699 degrees of freedom
## Residual deviance: 642.84 on 664 degrees of freedom
## AIC: 714.84
##
## Number of Fisher Scoring iterations: 5
## [1] 642.8371
## [1] 714.8371
## [1] 878.676
## MR FPR FNR
## GLM: In-Sample 0.35 0.4597938 0.1023256
## MR FPR FNR
## GLM: Out-of-Sample 0.3733333 0.455814 0.1647059
## Predicted
## Truth 0 1
## 0 242 243
## 1 8 207
## Predicted
## Truth 0 1
## 0 100 115
## 1 8 77
## Misclassification Rate Cost
## Tree: In-Sample 0.3585714 0.4042857
## Tree: Out-of-Sample 0.3600000 0.5166667
## response ~ s(duration) + amount + age + chk_acct + credit_his +
## purpose + saving_acct + present_emp + installment_rate +
## sex + other_debtor + present_resid + property + other_install +
## housing + n_credits + job + n_people + telephone + foreign
##
## Family: binomial
## Link function: logit
##
## Formula:
## response ~ s(duration) + amount + age + chk_acct + credit_his +
## purpose + saving_acct + present_emp + installment_rate +
## sex + other_debtor + present_resid + property + other_install +
## housing + n_credits + job + n_people + telephone + foreign
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.915e+00 1.342e+00 2.172 0.029841 *
## amount 1.679e-04 5.557e-05 3.022 0.002510 **
## age -2.129e-02 1.164e-02 -1.829 0.067436 .
## chk_acctA12 -4.092e-01 2.662e-01 -1.537 0.124180
## chk_acctA13 -1.375e+00 4.621e-01 -2.974 0.002936 **
## chk_acctA14 -1.736e+00 2.857e-01 -6.077 1.23e-09 ***
## credit_hisA31 -3.868e-01 6.915e-01 -0.559 0.575936
## credit_hisA32 -1.326e+00 5.533e-01 -2.397 0.016533 *
## credit_hisA33 -1.406e+00 6.010e-01 -2.340 0.019291 *
## credit_hisA34 -1.890e+00 5.624e-01 -3.361 0.000778 ***
## purposeA41 -1.806e+00 4.420e-01 -4.086 4.39e-05 ***
## purposeA410 -1.431e+00 9.006e-01 -1.589 0.112064
## purposeA42 -7.965e-01 3.168e-01 -2.514 0.011923 *
## purposeA43 -9.639e-01 3.066e-01 -3.144 0.001668 **
## purposeA44 -3.107e-01 8.787e-01 -0.354 0.723691
## purposeA45 -1.234e+00 7.784e-01 -1.585 0.112938
## purposeA46 3.057e-01 4.786e-01 0.639 0.523023
## purposeA48 -1.551e+00 1.206e+00 -1.286 0.198387
## purposeA49 -1.121e+00 4.114e-01 -2.724 0.006445 **
## saving_acctA62 -2.798e-01 3.384e-01 -0.827 0.408290
## saving_acctA63 -4.296e-01 5.241e-01 -0.820 0.412407
## saving_acctA64 -1.913e+00 7.325e-01 -2.612 0.009014 **
## saving_acctA65 -1.048e+00 3.160e-01 -3.318 0.000908 ***
## present_empA72 -3.695e-01 5.196e-01 -0.711 0.476970
## present_empA73 -2.063e-01 5.021e-01 -0.411 0.681163
## present_empA74 -9.159e-01 5.336e-01 -1.716 0.086092 .
## present_empA75 -2.663e-02 5.114e-01 -0.052 0.958476
## installment_rate 3.999e-01 1.102e-01 3.628 0.000286 ***
## sexA92 -4.558e-01 4.882e-01 -0.934 0.350551
## sexA93 -8.497e-01 4.754e-01 -1.787 0.073914 .
## sexA94 -1.884e-01 5.668e-01 -0.332 0.739612
## other_debtorA102 5.423e-01 4.535e-01 1.196 0.231813
## other_debtorA103 -9.405e-01 5.203e-01 -1.808 0.070678 .
## present_resid -1.209e-01 1.058e-01 -1.142 0.253323
## propertyA122 -1.920e-03 3.087e-01 -0.006 0.995039
## propertyA123 1.521e-01 2.912e-01 0.522 0.601373
## propertyA124 6.616e-01 5.009e-01 1.321 0.186571
## other_installA142 5.051e-01 5.205e-01 0.970 0.331858
## other_installA143 -7.703e-01 2.859e-01 -2.694 0.007056 **
## housingA152 -5.690e-01 2.846e-01 -2.000 0.045552 *
## housingA153 -7.270e-01 5.720e-01 -1.271 0.203734
## n_credits -4.692e-02 2.350e-01 -0.200 0.841725
## jobA172 5.602e-01 8.616e-01 0.650 0.515561
## jobA173 5.633e-01 8.310e-01 0.678 0.497834
## jobA174 1.500e-01 8.427e-01 0.178 0.858699
## n_people 1.861e-01 3.003e-01 0.620 0.535410
## telephoneA192 -3.259e-01 2.498e-01 -1.305 0.191940
## foreignA202 -1.395e+00 7.151e-01 -1.951 0.051030 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(duration) 1.613 2.023 5.752 0.059 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.276 Deviance explained = 28.4%
## UBRE = 0.02522 Scale est. = 1 n = 700
## [1] 717.6541
## [1] 943.4484
## [1] 618.4274
## [1] 700
## Predicted
## Observed 0 1
## 0 167 318
## 1 6 209
## [1] 0.4628571
## [1] 717.6541
## [1] 943.4484
##
## 0.5271429
## searchgrid
## 0.1
## Predicted
## Observed 0 1
## 0 93 122
## 1 6 79
## [1] 0.4266667
## [1] 0.5066667
## Warning: package 'nnet' was built under R version 3.6.3
##
## Attaching package: 'nnet'
## The following object is masked from 'package:mgcv':
##
## multinom
## Warning: package 'gamlss.add' was built under R version 3.6.3
## Loading required package: gamlss.dist
## Warning: package 'gamlss.dist' was built under R version 3.6.3
## Loading required package: gamlss
## Warning: package 'gamlss' was built under R version 3.6.3
## Loading required package: splines
## Loading required package: gamlss.data
##
## Attaching package: 'gamlss.data'
## The following object is masked from 'package:boot':
##
## aids
## The following object is masked from 'package:datasets':
##
## sleep
## Loading required package: parallel
## ********** GAMLSS Version 5.1-6 **********
## For more on GAMLSS look at http://www.gamlss.org/
## Type gamlssNews() to see new features/changes/bug fixes.
##
## Attaching package: 'gamlss'
## The following object is masked from 'package:psych':
##
## cs
##
## Attaching package: 'gamlss.add'
## The following object is masked from 'package:psych':
##
## tr
## # weights: 51
## initial value 153.803679
## iter 10 value 146.330574
## iter 20 value 133.024568
## iter 30 value 109.353186
## iter 40 value 98.711610
## iter 50 value 95.109670
## iter 60 value 90.984716
## iter 70 value 89.391408
## iter 80 value 89.355977
## final value 89.355819
## converged
## Warning in plot.window(...): "rep" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "rep" is not a graphical parameter
## Warning in title(...): "rep" is not a graphical parameter
## Predicted
## Observed 0 1
## 0 166 49
## 1 33 52
## Warning in german_credit.train$response != pred.nnet: longer object length is
## not a multiple of shorter object length
## Warning in (observed == 1) & (predicted == 0): longer object length is not a
## multiple of shorter object length
## Warning in (observed == 0) & (predicted == 1): longer object length is not a
## multiple of shorter object length
| Metric | GLM | TREE | GAM | NN |
|---|---|---|---|---|
| mse | 0.3500000 | 0.3585714 | 0.4628571 | 0.4271429 |
| mspe | 0.3733333 | 0.4100000 | 0.4266667 | 0.2733333 |