Metis NY Data Science Bootcamp
Marco Lunardi
Bank_Additional_Full (supervised):
20 Features, 41'188 samples
It's the most complete one out of the provided datasets, and includes also macroeconomic variables
It contains the outcomes from a phone calls campaign by a Portuguese Bank to sell a term deposit product
Even after a first exploratory look at the data it turns out that some of the dataset features have a very low variance, and little influence (if non-existent) on the desired outcome
For instance, features like “having a loan”, or “having a mortgage” surprisingly (to me) have almost no influence on the customer choice to buy or not to buy the proposed term deposit
Then, some features pruning should be made.
I tried many training/testing sizes, and many classifiers:
K-Neighbors, Logistic, Decision Tree, SVM, Random Forest, Gaussian NB
The outcomes are quite similar for all the classifiers, and not quite satisfying
Accuracy on the testing sets (out of samples) is around 90%
ROC Area between 0.76 -> 0.78
Stratified Folding confirms the average ROC Area values
ROC values are worse for SVC and Decision Tree (around 0.63 for both)
So, there are samples that are pretty hard to be classified
I selected and analyzed the samples that couldn't be correctly classified even by the model that turned out to be the best (Logistic Regression)
They show much different patterns from the other samples, then it's quite hard to classify them correctly by using the available features
Using the available features, the predicting model based on 10 inputs and the Logistic Regression Classifier is not much bad
It can be used to better focusing the bank campaigns, and to raise the chances of increasing the number of sold products with respect to the money invested into the same campaigns
More features should be added to get better outcomes
Features focused on the actual known profile of the client (total wealth, products bought in the past, risk aversion profile) should tell us more about the customer capabilities to stand a temporary lock on a portion of her/his wealth, that usually is the main customer fear about a term deposit