Telecom Churn Prediction

Natália Faraj Murad

The objective of this project is predict behavior to retain customers. Analysing all relevant customer data can help to develop focused customer retention programs. The dataset was obtained from https://www.kaggle.com/mnassrib/telecom-churn-datasets. It contains information about the State, Area, Account, Type of plan of the customers, how much they spend on charges, the schedule they do more calls and others. Here these variables are explored in order to get insights about customer behavior, and Machine Learning Algorithms are used to predict if a customer will continue with its plan or not.

Importing libraries

Reading dataset

Data exploration

Churn does not change with the area. It is higher in people that have international plans. It is higher in (22%) e TX (25%) and the mean rate is 14%; Higher in people that does not use voice-mail (17%). People that use it, the rate is 9%. The churn increases from 250 min of calls; from 40 total day charge; from 250 total eve min; from 20 total eve charge. Until 50 total day calls the churn is high. It decrease with the total night min, suggesting that is advantageous to people that prefers to call at night.

Distribution of Numeric Variables & Churn

People that spend more minutes during the day and have higher total day charge tends to churn more. Maybe they need a day offer. People that do not use Voice Messages has high churn.

People that cancel profile

Churn Rate

The churn rate is 14.49%

Numeric Variables Colinearity

Colinearity

total_night_charge and total_night_minutes

total_day_charge and total_day_minutes

total_intl_charge and total_intl_minutes

It is expected because if the person uses more minutes, they will need to charge more.

Churn vs number of service calls

Churn rate increases with the number of customer service calls.

Total intl charge vs Churn

No significant differences.

Data preprocessing

Droppping not important variables

It does not present null values

Grouping Categories

Changing yes and no to 0 and 1

Controlling the scale of continue variables - padronization

Categorize continue variables

This step is useful to improve just for the Decision Tree Classifier Model.

Dummy Variables

Split train & test datasets

Training Several Baseline Models

Random Forest presented a better performance

Decision Tree Classifier

Decision Tree Accuracy: 90%

KNN

KNN Model Accuracy: 88%

Random Forest

Optimal Number of Trees

Random Forest Accuracy: 95%

XGBoost

XGBoost Accuracy: 95%

Voting Classifier

Voting Accuracy 96%

Searching Best Parameters

Best Model Parameters:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=0.95, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.03, max_delta_step=0, max_depth=4, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=350, n_jobs=4, nthread=4, num_parallel_tree=1, random_state=100, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=100, subsample=1, tree_method='exact', validate_parameters=1, verbosity=None)

Conclusion

The best models were Random Forest and XGBoost with 95% of Accuracy and Voting with 96% of Accuracy.