Executive Summary

This report is a analysis of 4 major methods for creating predictive models:

  1. Generalized Linear Model (GLM)
  2. Tree Classification
  3. General Adaptive Model (GAM)
  4. Neural Networks

Each of these four methods were applied to two datasets representing the two different data set categories:

  1. Continous Data - Boston Housing Data
  2. Categorical Data - German Credit Data

For comparing the “fit” of the models to determine a best-to-worst performance list the MSE and MSPE were calculated for each model. These metrics were compared only within their respective categories. Each section has a tab labelled “Final Analysis” which contains a table of all the values.A summary of the results is as follows.

Boston Housing Data

Exploratory Analysis

GLM

Model

model_step_glm

Model Assessment

Tree Models

Plotting the tree

Plot pruned tree

GAM

Model

## [1] 1851.647
## [1] 2066.989
## [1] 2828.861

The non-linear relationship of variables with medv can be seen in the following plots:

In-sample prediction

## [1] 7.991131

Out-of-sample prediction - MSPE

## [1] 14.35228

Neural Network

## Warning: package 'neuralnet' was built under R version 3.6.3
## 
## Attaching package: 'neuralnet'
## The following object is masked from 'package:dplyr':
## 
##     compute

Plotted NN

Final Analysis

Metric GLM TREE GAM NN
mse 22.55819 14.34562 7.991131 6.694951
mspe 26.31746 19.70207 14.352277 15.592189

German Data

EDA

GLM

## 
## Call:
## glm(formula = response ~ chk_acct + duration + credit_his + purpose + 
##     saving_acct + present_emp + installment_rate + sex + other_debtor + 
##     other_install + housing, family = binomial, data = german_credit.train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0411  -0.7110  -0.4039   0.7209   2.6735  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        1.742083   0.870547   2.001 0.045378 *  
## chk_acctA12       -0.292478   0.253344  -1.154 0.248309    
## chk_acctA13       -1.320577   0.443625  -2.977 0.002913 ** 
## chk_acctA14       -1.642158   0.272776  -6.020 1.74e-09 ***
## duration           0.048188   0.009069   5.314 1.08e-07 ***
## credit_hisA31     -0.361028   0.650907  -0.555 0.579131    
## credit_hisA32     -1.204443   0.515576  -2.336 0.019485 *  
## credit_hisA33     -1.353270   0.587232  -2.304 0.021195 *  
## credit_hisA34     -1.893126   0.544537  -3.477 0.000508 ***
## purposeA41        -1.503055   0.417294  -3.602 0.000316 ***
## purposeA410       -1.405119   0.816628  -1.721 0.085317 .  
## purposeA42        -0.682366   0.296909  -2.298 0.021548 *  
## purposeA43        -0.854577   0.291499  -2.932 0.003372 ** 
## purposeA44        -0.199797   0.829919  -0.241 0.809754    
## purposeA45        -1.221517   0.767520  -1.592 0.111494    
## purposeA46         0.292384   0.463419   0.631 0.528087    
## purposeA48        -1.422238   1.192221  -1.193 0.232896    
## purposeA49        -1.079342   0.395266  -2.731 0.006321 ** 
## saving_acctA62    -0.205578   0.327355  -0.628 0.530005    
## saving_acctA63    -0.477306   0.494161  -0.966 0.334099    
## saving_acctA64    -1.712172   0.707818  -2.419 0.015566 *  
## saving_acctA65    -1.008920   0.303236  -3.327 0.000877 ***
## present_empA72     0.016174   0.442898   0.037 0.970869    
## present_empA73     0.122999   0.409909   0.300 0.764129    
## present_empA74    -0.585216   0.454267  -1.288 0.197654    
## present_empA75     0.043752   0.429115   0.102 0.918790    
## installment_rate   0.247650   0.094631   2.617 0.008871 ** 
## sexA92            -0.339404   0.463878  -0.732 0.464373    
## sexA93            -0.654156   0.450692  -1.451 0.146655    
## sexA94            -0.188243   0.547651  -0.344 0.731051    
## other_debtorA102   0.667249   0.445567   1.498 0.134256    
## other_debtorA103  -1.113534   0.502317  -2.217 0.026637 *  
## other_installA142  0.387415   0.518702   0.747 0.455128    
## other_installA143 -0.708351   0.274655  -2.579 0.009907 ** 
## housingA152       -0.551443   0.262375  -2.102 0.035576 *  
## housingA153       -0.380916   0.393916  -0.967 0.333546    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 863.51  on 699  degrees of freedom
## Residual deviance: 642.84  on 664  degrees of freedom
## AIC: 714.84
## 
## Number of Fisher Scoring iterations: 5
## [1] 642.8371
## [1] 714.8371
## [1] 878.676

##                  MR       FPR       FNR
## GLM: In-Sample 0.35 0.4597938 0.1023256
##                           MR      FPR       FNR
## GLM: Out-of-Sample 0.3733333 0.455814 0.1647059

Classification Tree

##      Predicted
## Truth   0   1
##     0 242 243
##     1   8 207
##      Predicted
## Truth   0   1
##     0 100 115
##     1   8  77
##                     Misclassification Rate      Cost
## Tree: In-Sample                  0.3585714 0.4042857
## Tree: Out-of-Sample              0.3600000 0.5166667

GAM

## response ~ s(duration) + amount + age + chk_acct + credit_his + 
##     purpose + saving_acct + present_emp + installment_rate + 
##     sex + other_debtor + present_resid + property + other_install + 
##     housing + n_credits + job + n_people + telephone + foreign
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## response ~ s(duration) + amount + age + chk_acct + credit_his + 
##     purpose + saving_acct + present_emp + installment_rate + 
##     sex + other_debtor + present_resid + property + other_install + 
##     housing + n_credits + job + n_people + telephone + foreign
## 
## Parametric coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        2.915e+00  1.342e+00   2.172 0.029841 *  
## amount             1.679e-04  5.557e-05   3.022 0.002510 ** 
## age               -2.129e-02  1.164e-02  -1.829 0.067436 .  
## chk_acctA12       -4.092e-01  2.662e-01  -1.537 0.124180    
## chk_acctA13       -1.375e+00  4.621e-01  -2.974 0.002936 ** 
## chk_acctA14       -1.736e+00  2.857e-01  -6.077 1.23e-09 ***
## credit_hisA31     -3.868e-01  6.915e-01  -0.559 0.575936    
## credit_hisA32     -1.326e+00  5.533e-01  -2.397 0.016533 *  
## credit_hisA33     -1.406e+00  6.010e-01  -2.340 0.019291 *  
## credit_hisA34     -1.890e+00  5.624e-01  -3.361 0.000778 ***
## purposeA41        -1.806e+00  4.420e-01  -4.086 4.39e-05 ***
## purposeA410       -1.431e+00  9.006e-01  -1.589 0.112064    
## purposeA42        -7.965e-01  3.168e-01  -2.514 0.011923 *  
## purposeA43        -9.639e-01  3.066e-01  -3.144 0.001668 ** 
## purposeA44        -3.107e-01  8.787e-01  -0.354 0.723691    
## purposeA45        -1.234e+00  7.784e-01  -1.585 0.112938    
## purposeA46         3.057e-01  4.786e-01   0.639 0.523023    
## purposeA48        -1.551e+00  1.206e+00  -1.286 0.198387    
## purposeA49        -1.121e+00  4.114e-01  -2.724 0.006445 ** 
## saving_acctA62    -2.798e-01  3.384e-01  -0.827 0.408290    
## saving_acctA63    -4.296e-01  5.241e-01  -0.820 0.412407    
## saving_acctA64    -1.913e+00  7.325e-01  -2.612 0.009014 ** 
## saving_acctA65    -1.048e+00  3.160e-01  -3.318 0.000908 ***
## present_empA72    -3.695e-01  5.196e-01  -0.711 0.476970    
## present_empA73    -2.063e-01  5.021e-01  -0.411 0.681163    
## present_empA74    -9.159e-01  5.336e-01  -1.716 0.086092 .  
## present_empA75    -2.663e-02  5.114e-01  -0.052 0.958476    
## installment_rate   3.999e-01  1.102e-01   3.628 0.000286 ***
## sexA92            -4.558e-01  4.882e-01  -0.934 0.350551    
## sexA93            -8.497e-01  4.754e-01  -1.787 0.073914 .  
## sexA94            -1.884e-01  5.668e-01  -0.332 0.739612    
## other_debtorA102   5.423e-01  4.535e-01   1.196 0.231813    
## other_debtorA103  -9.405e-01  5.203e-01  -1.808 0.070678 .  
## present_resid     -1.209e-01  1.058e-01  -1.142 0.253323    
## propertyA122      -1.920e-03  3.087e-01  -0.006 0.995039    
## propertyA123       1.521e-01  2.912e-01   0.522 0.601373    
## propertyA124       6.616e-01  5.009e-01   1.321 0.186571    
## other_installA142  5.051e-01  5.205e-01   0.970 0.331858    
## other_installA143 -7.703e-01  2.859e-01  -2.694 0.007056 ** 
## housingA152       -5.690e-01  2.846e-01  -2.000 0.045552 *  
## housingA153       -7.270e-01  5.720e-01  -1.271 0.203734    
## n_credits         -4.692e-02  2.350e-01  -0.200 0.841725    
## jobA172            5.602e-01  8.616e-01   0.650 0.515561    
## jobA173            5.633e-01  8.310e-01   0.678 0.497834    
## jobA174            1.500e-01  8.427e-01   0.178 0.858699    
## n_people           1.861e-01  3.003e-01   0.620 0.535410    
## telephoneA192     -3.259e-01  2.498e-01  -1.305 0.191940    
## foreignA202       -1.395e+00  7.151e-01  -1.951 0.051030 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##               edf Ref.df Chi.sq p-value  
## s(duration) 1.613  2.023  5.752   0.059 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.276   Deviance explained = 28.4%
## UBRE = 0.02522  Scale est. = 1         n = 700

## [1] 717.6541
## [1] 943.4484
## [1] 618.4274
## [1] 700
##         Predicted
## Observed   0   1
##        0 167 318
##        1   6 209
## [1] 0.4628571
## [1] 717.6541
## [1] 943.4484

##           
## 0.5271429
## searchgrid 
##        0.1
##         Predicted
## Observed   0   1
##        0  93 122
##        1   6  79
## [1] 0.4266667
## [1] 0.5066667

Neural Network

## Warning: package 'nnet' was built under R version 3.6.3
## 
## Attaching package: 'nnet'
## The following object is masked from 'package:mgcv':
## 
##     multinom
## Warning: package 'gamlss.add' was built under R version 3.6.3
## Loading required package: gamlss.dist
## Warning: package 'gamlss.dist' was built under R version 3.6.3
## Loading required package: gamlss
## Warning: package 'gamlss' was built under R version 3.6.3
## Loading required package: splines
## Loading required package: gamlss.data
## 
## Attaching package: 'gamlss.data'
## The following object is masked from 'package:boot':
## 
##     aids
## The following object is masked from 'package:datasets':
## 
##     sleep
## Loading required package: parallel
##  **********   GAMLSS Version 5.1-6  **********
## For more on GAMLSS look at http://www.gamlss.org/
## Type gamlssNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'gamlss'
## The following object is masked from 'package:psych':
## 
##     cs
## 
## Attaching package: 'gamlss.add'
## The following object is masked from 'package:psych':
## 
##     tr
## # weights:  51
## initial  value 153.803679 
## iter  10 value 146.330574
## iter  20 value 133.024568
## iter  30 value 109.353186
## iter  40 value 98.711610
## iter  50 value 95.109670
## iter  60 value 90.984716
## iter  70 value 89.391408
## iter  80 value 89.355977
## final  value 89.355819 
## converged
## Warning in plot.window(...): "rep" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "rep" is not a graphical parameter
## Warning in title(...): "rep" is not a graphical parameter
##         Predicted
## Observed   0   1
##        0 166  49
##        1  33  52
## Warning in german_credit.train$response != pred.nnet: longer object length is
## not a multiple of shorter object length
## Warning in (observed == 1) & (predicted == 0): longer object length is not a
## multiple of shorter object length
## Warning in (observed == 0) & (predicted == 1): longer object length is not a
## multiple of shorter object length

Final Analysis

Metric GLM TREE GAM NN
mse 0.3500000 0.3585714 0.4628571 0.4271429
mspe 0.3733333 0.4100000 0.4266667 0.2733333