Capstone Final Project

Pankaj Shah

February 11, 2019

COUNTDOWN BEGINS

⌛Set Time to 5 Minutes Predictive Analytics Presentation.⌛ JS tutorial

😟 Stopwatch made in JavaScript for this Presentation 😟

0 : 00 : 000

🍀Good Luck Finishing Presentation In Five Minutes.🍀

\(\color{red}{\text{ Capstone Project : Telecom Churn Analytics}}\)

\(\color{red}{\text{☎️Telecom Churn Industry ☎️}}\)

\(\color{red}{\text{2.Project Definition}}\)

Definition of Churn:

  • Cancelation of Subscription.
  • Closure of an active account
  • Non Renewal of Contract/service Agreement
  • Decision to switch carriers/provider
  • End of relationship between subscriber & provider.
  • Getting divorced with your loved/hate ones.


\(\color{red}{\text{3.Hypothesis}}\)









3.Hypothesis for Telecom Project.

  • Supervised Classification problem
  • The driving hypothesis behind this Project would be to find all the variables that play key role in minimizing customer defections.
  • How can Telecom reduce the Churn Ratio so that Telecom Company stays in business.

  • Predict if the customer will Churn or Not looking at past historical data. Trying to find the associated probabilities?







\(\color{red}{\text{4.Project Workflow}}\)









\(\color{red}{\text{5.The Data Overview}}\)

  • Source : Kaggle

  • 3333 Entries &

  • 21 variables

  • 1 Categorical

  • 20 Numeric Variables













\(\color{red}{\text{6. DATA }}\)






\(\color{red}{\text{7.Clean-up & Manipulation}}\)

PRE-PRCOESSING

  • Customer Satisfaction are hardest to capture.
  • didn’t have features like Product ratings & customer support calls.
  • look at the dimensions
  • Review the structure of Input datasets
  • Find out if there is any missing values
  • How features are interrelated or correlated
  • Evaluate presence of outliers if any.
  • Trim White Space from Voice_Mail_Plan and then recode to Zero and One.
  • Trim White Space from International_Plan then recode to Zero and One.














\(\color{red}{\text{8.Feature Engineering}}\)




  • \(\color{green}{\text{Active Minutes ✅}}\)

  • \(\color{green}{\text{Active Charges ✅}}\)


-

..

  • State State

  • Account Length Account Length

  • Area Code Area Code

  • Phone Number Phone Number

  • Avoiding multicollinearity


\(\color{red}{\text{9.Model Building Machine learning}}\)

  • handle imbalanced classes








👍10.Choose Best Model👍

Training Sets:70%👍 Validation Sets:15% AUC Score👍👍

11.🌳Random Forest 🌳

  • large difference between the training and validation scores: High variance
  • increase the complexity of the model rather than having simple model by adding some new features.
  • try Reducing Regularization and change model architecture to reduce underfitting.
  • I look into inspire new feature ideas and try to make my model complicated to decrease variance.

12.💡Positive Feature Importance Score 💡

🎛️Hyperparameter Tunning 🎛️

  • max_depth controls the depth of the tree in Random Forest.
  • Change the number of variables that are sampled at each split.
  • Use the AUC as the comparison metric.

⛩️13. Model Evaluation ROC Curve ⛩️

14. Conclusion:

    1. We all know that it would have been lot easier to interpret linear classifier than non linear models.
    1. The biggest disadvantage of being linear would be to have high bias towards fitting non linear model.
    1. One of the advantage of Random Forest was it can work with highly corelated features. It can deal with non-linearrties much better than logistic Regression.
    1. To make prediction it uses multiple decission trees which is more effective than single decission tree.
    1. Then for each observation that it encounters from each node, it choose their category by taking the majority vote of different trees.
  • 6.The data with low bias & high variance (non-linear relationships) can be handled properly & efficiently.

15. Conclusion:

    1. Senstivity (Measure of True Positive Rate(TPR)/Recall (TP/(TP+FN)): 74%
    1. That means with this model we can predcit 74% of time who will Churn before hand with current features.
    1. Lets Look at AUC which is 0.832 which is much better than our baseline model 0.5

16. Reference

\(\color{red}{\text{18.⌛ 🙏Thank You 🙏⌛ }}\)

\(\color{red}{\text{Name: Pankaj Shah }}\)
\(\color{red}{\text{ALY-6020: Predictive Analytics}}\)
\(\color{red}{\text{Final Capstone Project}}\)
\(\color{red}{\text{Instructor Andrew Long }}\)

COUNTDOWN ENDS

⌛Set Time to 5 Minutes ⌛ JS tutorial

😟 Stopwatch made in JavaScript for this Presentation 😟

0 : 00 : 000

😞😭Please Scroll to your First Slide to see how You did Time Wise. Rendering Failed.😞😭