Sem 2- Assignment 1

Question 1 - Visual Exploration

(A)

(B)

Commentary

  • Married couples are less likely to subscribe to a term deposit plan. This could be the result of a combined income, and a lesser need for a term deposit plan.

  • Customers with a loan are less likely to subscribe to a term deposit plan. This could be the result of existing pressure with their current loans.

  • Customers that have a longer duration are more likely to subscribe. It’s likely this is the product of trust, and customer nourishment that’s been built up over time.

  • The bank previously contacting customers, yields a greater chance of the customer subscribing.

Question 2- Classification Trees

(3)

(4)

(A)

If a customer is with the bank for a minimum of 765 days, have an existing loan and are 48 years old or older, its more likely they will subscribe.

The node is relatively pure, as the majority of outcomes are “yes”.

(B)

If the customer is with the bank for less than 377 days, it’s likely they wont subscribe.

This node is very pure as almost all outcomes are “No”

(C)

Col1 Col2
Duration < 377, Previous Outcome = failure, other, unknown 78%
Duration < 377, Previous Outcome not = failure/other/unknown, Duration < 251 1%
Duration < 377, Previous Outcome not = failure/other/unknown, Duration > 251 1%
Duration > 377, Duration < 765, Previous Outcome = failure/other/unknown, Age < 59 13%
Duration > 377, Duration < 765, Previous Outcome = failure/other/unknown, Age > 59 1%
Duration > 377, Duration < 765, Previous Outcome not = failure/other/unknown 1%
Duration > 377, Duration > 765, Age >= 48 1%
Duration > 377, Duration > 765, Age =< 48, Loan = yes 1%
Duration > 377, Duration > 765, Age =< 48, Loan not = yes 3%

(5)

    id age marital balance loan num_prev_contacts duration prev_outcome
1 3961  32  single     473   no                 1       72        other
2 2722  44 married    1058   no                 0      188      unknown
3 3060  39 married     186   no                 0      433      unknown
4 3705  35 married       0   no                 0      146      unknown
5 3328  46 married    1291   no                17      142      failure
6 2403  35 married     280  yes                 0       65      unknown
  subscribed        No        Yes train_preds
1         No 0.9586936 0.04130644          No
2        Yes 0.9586936 0.04130644          No
3         No 0.8156425 0.18435754          No
4         No 0.9586936 0.04130644          No
5         No 0.9586936 0.04130644          No
6         No 0.9586936 0.04130644          No
      Predicted
Actual   No  Yes
   No  1176   15
   Yes   90   53
    id age  marital balance loan duration num_prev_contacts prev_outcome
1 3786  51   single     -55   no      119                 0      unknown
2  503  43 divorced     738   no      585                 4      failure
3 3430  53  married     719   no      230                 0      unknown
4 3696  42  married      83  yes      184                 0      unknown
5 4090  45  married     185   no      249                 0      unknown
6 3052  36  married    1554   no      325                 0      unknown
  subscribed        No        Yes test_preds
1         No 0.9586936 0.04130644         No
2         No 0.8156425 0.18435754         No
3         No 0.9586936 0.04130644         No
4         No 0.9586936 0.04130644         No
5         No 0.9586936 0.04130644         No
6         No 0.9586936 0.04130644         No
      Predicted
Actual   No  Yes
   No  2568   73
   Yes  267   92

(A)

The overall accuracy of this model is (92+2568)/3000 = 89%

Yes the classification tree is not over fitting the training data set. The overall accuracy of the test data (92%) set is higher than the training data set (88%)

Question 3 - Binary Logistic Regression

(6)

[1] "Yes" "No" 
[1] "Yes" "No" 

(7)


Call:
glm(formula = subscribed ~ age + marital + balance + loan + num_prev_contacts + 
    duration + prev_outcome, family = binomial(link = "logit"), 
    data = bank_train)

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)          3.960e+00  7.123e-01   5.559 2.72e-08 ***
age                 -2.261e-02  1.047e-02  -2.159  0.03085 *  
maritalmarried       2.145e-01  3.293e-01   0.651  0.51475    
maritalsingle       -2.616e-01  3.780e-01  -0.692  0.48900    
balance             -9.118e-07  3.112e-05  -0.029  0.97663    
loanyes              1.051e+00  3.974e-01   2.646  0.00815 ** 
num_prev_contacts   -1.664e-02  7.774e-02  -0.214  0.83052    
duration            -4.033e-03  3.559e-04 -11.332  < 2e-16 ***
prev_outcomeother   -2.931e-01  4.998e-01  -0.587  0.55753    
prev_outcomesuccess -2.084e+00  4.438e-01  -4.696 2.65e-06 ***
prev_outcomeunknown  6.768e-01  3.871e-01   1.748  0.08039 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 908.76  on 1333  degrees of freedom
Residual deviance: 664.86  on 1323  degrees of freedom
AIC: 686.86

Number of Fisher Scoring iterations: 6

(A)

Marital Divorce, Loan no & previous outcome failure were all omitted

  1. 3.36 + (-2.26 x age) + (2.15 x marital married) + (-2.61 x marital single) + (9.11 x loan yes) + (-1.66 x number of previous contacts) + (-4.03 x duration) + (-2.08 x previous outcome other) + (-2.08 x previous outcome success) + (6.76 x previous outcome unknown).

(C)

Age, Loan yes, Duration, Previous outcome success.

(D)

Married, loan yes & Previous outcome unknown are all positive coefficients . This means there’s an increased chance of customers within these variables subscribing.

(8)

      Predicted
Actual  Yes   No
   Yes   42  101
   No    23 1168

The overall accuracy of this model is (42+1168)/1334 = 91%

      Predicted
Actual  Yes   No
   Yes   94  265
   No    50 2591

The overall accuracy of this model is (94+2591)/3000 = 90%

Question 4 - Model Comparison & Marketing Actions

(9)

The accuracy of the classification tree model is 89%, while the accuracy of the binary logistic regression model is 90%.

(10)

The binary logistic regression model should be used by the company, as it’s more accurate by a percent. 90% vs 89%.

(11)

(A)

Marital status, Age, and if the customers currently have an existing loan are the key factors the bank must look at when deciding the customers they want to target for advertising a new loan.

(B)