1. Objective

As a business analyst and strategist, I’ll try to figure out why customers are switching to other telecom services in California, USA. Also, provide meaningful recommendations on how to reduce this churn phenomenon in a competitive market.

2. Overview

In the analysis process, I’ve tried to discover significant insights the data tells us through EDA (Elaborative Data Analysis). Also, there were many factors(variables) with these factors, I’ve tried to identify significant factors that are most likely to be the reason for customer churn.


3. Data Description

The data is publicly available on kaggle.com. The first dataset contains user data and churn reasons, while the second dataset contains population size for each zipcode. The data sets consist of character, integer, numerical, and factor type variables.

3.2 Inspecting the dataset

Summary

##     Zip.Code         Customer.ID      Gender          Age        Married   
##  Min.   :90001   0002-ORFBO:   1   Female:3488   Min.   :19.00   No :3641  
##  1st Qu.:92101   0003-MKNFE:   1   Male  :3555   1st Qu.:32.00   Yes:3402  
##  Median :93518   0004-TLHLJ:   1                 Median :46.00             
##  Mean   :93486   0011-IGKFF:   1                 Mean   :46.51             
##  3rd Qu.:95329   0013-EXCHZ:   1                 3rd Qu.:60.00             
##  Max.   :96150   0013-MHZWF:   1                 Max.   :80.00             
##                  (Other)   :7037                                           
##  Number.of.Dependents            City         Latitude       Longitude     
##  Min.   :0.0000       Los Angeles  : 293   Min.   :32.56   Min.   :-124.3  
##  1st Qu.:0.0000       San Diego    : 285   1st Qu.:33.99   1st Qu.:-121.8  
##  Median :0.0000       San Jose     : 112   Median :36.21   Median :-119.6  
##  Mean   :0.4687       Sacramento   : 108   Mean   :36.20   Mean   :-119.8  
##  3rd Qu.:0.0000       San Francisco: 104   3rd Qu.:38.16   3rd Qu.:-118.0  
##  Max.   :9.0000       Fresno       :  61   Max.   :41.96   Max.   :-114.2  
##                       (Other)      :6080                                   
##  Number.of.Referrals Tenure.in.Months     Offer      Phone.Service
##  Min.   : 0.000      Min.   : 1.00    None   :3877   No : 682     
##  1st Qu.: 0.000      1st Qu.: 9.00    Offer A: 520   Yes:6361     
##  Median : 0.000      Median :29.00    Offer B: 824                
##  Mean   : 1.952      Mean   :32.39    Offer C: 415                
##  3rd Qu.: 3.000      3rd Qu.:55.00    Offer D: 602                
##  Max.   :11.000      Max.   :72.00    Offer E: 805                
##                                                                   
##  Avg.Monthly.Long.Distance.Charges Multiple.Lines Internet.Service
##  Min.   : 1.01                        : 682       No :1526        
##  1st Qu.:13.05                     No :3390       Yes:5517        
##  Median :25.69                     Yes:2971                       
##  Mean   :25.42                                                    
##  3rd Qu.:37.68                                                    
##  Max.   :49.99                                                    
##  NA's   :682                                                      
##      Internet.Type  Avg.Monthly.GB.Download Online.Security Online.Backup
##             :1526   Min.   : 2.00              :1526           :1526     
##  Cable      : 830   1st Qu.:13.00           No :3498        No :3088     
##  DSL        :1652   Median :21.00           Yes:2019        Yes:2429     
##  Fiber Optic:3035   Mean   :26.19                                        
##                     3rd Qu.:30.00                                        
##                     Max.   :85.00                                        
##                     NA's   :1526                                         
##  Device.Protection.Plan Premium.Tech.Support Streaming.TV Streaming.Movies
##     :1526                  :1526                :1526        :1526        
##  No :3095               No :3473             No :2810     No :2785        
##  Yes:2422               Yes:2044             Yes:2707     Yes:2732        
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##  Streaming.Music Unlimited.Data           Contract    Paperless.Billing
##     :1526           :1526       Month-to-Month:3610   No :2872         
##  No :3029        No : 772       One Year      :1550   Yes:4171         
##  Yes:2488        Yes:4745       Two Year      :1883                    
##                                                                        
##                                                                        
##                                                                        
##                                                                        
##          Payment.Method Monthly.Charge   Total.Charges    Total.Refunds   
##  Bank Withdrawal:3909   Min.   :-10.00   Min.   :  18.8   Min.   : 0.000  
##  Credit Card    :2749   1st Qu.: 30.40   1st Qu.: 400.1   1st Qu.: 0.000  
##  Mailed Check   : 385   Median : 70.05   Median :1394.5   Median : 0.000  
##                         Mean   : 63.60   Mean   :2280.4   Mean   : 1.962  
##                         3rd Qu.: 89.75   3rd Qu.:3786.6   3rd Qu.: 0.000  
##                         Max.   :118.75   Max.   :8684.8   Max.   :49.790  
##                                                                           
##  Total.Extra.Data.Charges Total.Long.Distance.Charges Total.Revenue     
##  Min.   :  0.000          Min.   :   0.00             Min.   :   21.36  
##  1st Qu.:  0.000          1st Qu.:  70.55             1st Qu.:  605.61  
##  Median :  0.000          Median : 401.44             Median : 2108.64  
##  Mean   :  6.861          Mean   : 749.10             Mean   : 3034.38  
##  3rd Qu.:  0.000          3rd Qu.:1191.10             3rd Qu.: 4801.15  
##  Max.   :150.000          Max.   :3564.72             Max.   :11979.34  
##                                                                         
##  Customer.Status         Churn.Category                        Churn.Reason 
##  Churned:1869                   :5174                                :5174  
##  Joined : 454    Attitude       : 314   Competitor had better devices: 313  
##  Stayed :4720    Competitor     : 841   Competitor made better offer : 311  
##                  Dissatisfaction: 321   Attitude of support person   : 220  
##                  Other          : 182   Don't know                   : 130  
##                  Price          : 211   Competitor offered more data : 117  
##                                         (Other)                      : 778  
##    Population    
##  Min.   :    11  
##  1st Qu.:  2344  
##  Median : 17554  
##  Mean   : 22140  
##  3rd Qu.: 36125  
##  Max.   :105285  
## 

Strucure

## 'data.frame':    7043 obs. of  39 variables:
##  $ Zip.Code                         : int  90001 90001 90001 90001 90002 90002 90002 90002 90003 90003 ...
##  $ Customer.ID                      : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 2308 86 2253 4701 3963 3129 3634 38 6795 ...
##  $ Gender                           : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 2 2 1 1 ...
##  $ Age                              : int  36 36 71 66 31 46 56 68 60 49 ...
##  $ Married                          : Factor w/ 2 levels "No","Yes": 2 2 2 1 2 1 1 1 2 2 ...
##  $ Number.of.Dependents             : int  0 0 0 0 0 0 0 0 0 3 ...
##  $ City                             : Factor w/ 1106 levels "Acampo","Acton",..: 555 555 555 555 555 555 555 555 555 555 ...
##  $ Latitude                         : num  34 34 34 34 33.9 ...
##  $ Longitude                        : num  -118 -118 -118 -118 -118 ...
##  $ Number.of.Referrals              : int  0 1 1 0 9 0 0 0 4 2 ...
##  $ Tenure.in.Months                 : int  1 17 69 8 58 34 13 38 59 3 ...
##  $ Offer                            : Factor w/ 6 levels "None","Offer A",..: 6 1 1 6 1 1 1 1 3 1 ...
##  $ Phone.Service                    : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 2 2 2 2 2 ...
##  $ Avg.Monthly.Long.Distance.Charges: num  NA NA 18.41 5.21 18.68 ...
##  $ Multiple.Lines                   : Factor w/ 3 levels "","No","Yes": 1 1 3 2 2 2 2 3 3 2 ...
##  $ Internet.Service                 : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 2 2 ...
##  $ Internet.Type                    : Factor w/ 4 levels "","Cable","DSL",..: 3 2 4 4 1 3 1 4 4 4 ...
##  $ Avg.Monthly.GB.Download          : int  10 10 17 8 NA 16 NA 13 14 22 ...
##  $ Online.Security                  : Factor w/ 3 levels "","No","Yes": 2 3 2 2 1 3 1 2 3 2 ...
##  $ Online.Backup                    : Factor w/ 3 levels "","No","Yes": 3 2 3 2 1 2 1 2 3 3 ...
##  $ Device.Protection.Plan           : Factor w/ 3 levels "","No","Yes": 2 3 3 3 1 3 1 2 2 2 ...
##  $ Premium.Tech.Support             : Factor w/ 3 levels "","No","Yes": 2 2 3 2 1 2 1 2 2 3 ...
##  $ Streaming.TV                     : Factor w/ 3 levels "","No","Yes": 2 2 3 3 1 2 1 3 3 2 ...
##  $ Streaming.Movies                 : Factor w/ 3 levels "","No","Yes": 2 2 3 3 1 2 1 3 2 2 ...
##  $ Streaming.Music                  : Factor w/ 3 levels "","No","Yes": 2 2 2 2 1 2 1 2 2 2 ...
##  $ Unlimited.Data                   : Factor w/ 3 levels "","No","Yes": 3 3 3 3 1 3 1 3 3 3 ...
##  $ Contract                         : Factor w/ 3 levels "Month-to-Month",..: 1 1 3 1 3 2 1 3 1 1 ...
##  $ Paperless.Billing                : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 2 2 ...
##  $ Payment.Method                   : Factor w/ 3 levels "Bank Withdrawal",..: 1 3 1 2 2 3 1 1 1 1 ...
##  $ Monthly.Charge                   : num  29.9 34.4 110 94.5 -4 ...
##  $ Total.Charges                    : num  29.9 592.8 7634.2 743 1186 ...
##  $ Total.Refunds                    : num  0 0 0 0 0 ...
##  $ Total.Extra.Data.Charges         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Total.Long.Distance.Charges      : num  0 0 1270.3 41.7 1083.4 ...
##  $ Total.Revenue                    : num  29.9 592.8 8904.5 784.6 2269.4 ...
##  $ Customer.Status                  : Factor w/ 3 levels "Churned","Joined",..: 2 3 3 1 3 3 3 3 3 2 ...
##  $ Churn.Category                   : Factor w/ 6 levels "","Attitude",..: 1 1 1 3 1 1 1 1 1 1 ...
##  $ Churn.Reason                     : Factor w/ 21 levels "","Attitude of service provider",..: 1 1 1 4 1 1 1 1 1 1 ...
##  $ Population                       : int  54492 54492 54492 54492 44586 44586 44586 44586 58198 58198 ...

4. Map View

There are 3 map view tabs. Each observation information can be retrieved by clicking on each point(observation) on the map.

Map view of all observation

Count of users in each ZipCode

The co-ordinates were in the int/numerical format, which were transformed to geo-coordinates. See the code above to see the transformation.

Percentage of User Churned in specific ZipCode

The map shows the percentage churn in each zipcode.

5. Data Cleaning and manipulation

There are some inconsistencies and missing values in the dataset which require processing. There are two services which telecom providers caters to their customers, Internet service and Phone service. In the dataset, the user not availing of the internet service, the internet type and additional services pertaining to it are missing. These data points need to be converted to 0 (referring to no service). Also, when the users have a non-internet service, the average monthly GB used is null. These data points were converted to 0 as their data usage is 0 when they don’t avail the internet service. There are a few factor variables which need to be recoded. Data points for binary variables, such as (Yes/No), are converted to 0 and 1, whereas variables with categories are transformed with numerical value, for example: Internet type

Variable Value Transformed Value
Internet Type None 0
Cable 1
DSL 2
Fiber Optic 3
mergedf3 <- mergedf
mergedf3[is.na(mergedf3)] = 0 #Replacing NA values with 0 from avg.monthly charges and avg.monthly GB used 
mergedf3$Multiple.Lines[mergedf3$Multiple.Lines == ""] <- "No" #Replacing blank values with "No"
transformedf <- mergedf3
transformedf$Offer <- recode_factor(transformedf$Offer, "None" = 0, "Offer A" = 1, "Offer B" = 2, "Offer C" = 3, "Offer D" = 4, "Offer E" = 5)
transformedf$Phone.Service <- recode_factor(transformedf$Phone.Service, No = 0, Yes = 1)
transformedf$Multiple.Lines <- recode_factor(transformedf$Multiple.Lines, No = 0, Yes = 1)
transformedf$Internet.Service <- recode_factor(transformedf$Internet.Service, No = 0, Yes =1)
transformedf$Internet.Type <- recode(transformedf$Internet.Type, "Cable" = 1, "DSL" = 2, "Fiber Optic" = 3)
transformedf$Internet.Type <- recode_factor(transformedf$Internet.Type, "Cable" = 1, "DSL" = 2, "Fiber Optic" = 3, .missing = 0)
transformedf$Online.Security <- recode(transformedf$Online.Security, No = 0, Yes = 1) #Unmatched values(Blanks in here) are coverted to NAs
transformedf$Online.Security <- recode_factor(transformedf$Online.Security, No = 0, Yes = 1, .missing = 0) #NAs are recoded as 0
transformedf$Online.Backup <- recode(transformedf$Online.Backup, No = 0, Yes = 1)
transformedf$Online.Backup <- recode_factor(transformedf$Online.Backup, No = 0, Yes = 1, .missing = 0)
transformedf$Device.Protection.Plan <- recode(transformedf$Device.Protection.Plan, No = 0, Yes = 1)
transformedf$Device.Protection.Plan <- recode_factor(transformedf$Device.Protection.Plan, No = 0, Yes = 1, .missing = 0)
transformedf$Premium.Tech.Support <- recode(transformedf$Premium.Tech.Support, No = 0, Yes = 1)
transformedf$Premium.Tech.Support <- recode_factor(transformedf$Premium.Tech.Support, No = 0, Yes = 1, .missing = 0)
transformedf$Streaming.TV <- recode(transformedf$Streaming.TV, No = 0, Yes = 1)
transformedf$Streaming.TV <- recode_factor(transformedf$Streaming.TV, No = 0, Yes = 1, .missing = 0)
transformedf$Streaming.Movies <- recode(transformedf$Streaming.Movies, No = 0, Yes = 1)
transformedf$Streaming.Movies <- recode_factor(transformedf$Streaming.Movies, No = 0, Yes = 1, .missing = 0)
transformedf$Streaming.Music <- recode(transformedf$Streaming.Music, No = 0, Yes = 1)
transformedf$Streaming.Music <- recode_factor(transformedf$Streaming.Music, No = 0, Yes = 1, .missing = 0)
transformedf$Unlimited.Data <- recode(transformedf$Unlimited.Data, No = 0, Yes = 1)
transformedf$Unlimited.Data <- recode_factor(transformedf$Unlimited.Data, No = 0, Yes = 1, .missing = 0)
transformedf$Contract <- recode_factor(transformedf$Contract, "Month-to-Month" = 1, "One Year" = 2, "Two Year" = 3)
transformedf$Paperless.Billing <- recode_factor(transformedf$Paperless.Billing, No = 0, Yes = 1)
transformedf$Payment.Method <- recode_factor(transformedf$Payment.Method, "Bank Withdrawal" = 1, "Credit Card" = 2, "Mailed Check" = 3)
transformedf$Churn.Category <- recode(transformedf$Churn.Category, "Attitude" = 1, "Competitor" = 2, "Dissatisfaction" = 3, "Price" = 4, "Other" = 5)
transformedf$Churn.Category <- recode_factor(transformedf$Churn.Category, "Attitude" = 1, "Competitor" = 2, "Dissatisfaction" = 3, "Price" = 4, "Other" = 5, .missing = 0)

Head

Zip.Code Customer.ID Gender Age Married Number.of.Dependents City Latitude Longitude Number.of.Referrals Tenure.in.Months Offer Phone.Service Avg.Monthly.Long.Distance.Charges Multiple.Lines Internet.Service Internet.Type Avg.Monthly.GB.Download Online.Security Online.Backup Device.Protection.Plan Premium.Tech.Support Streaming.TV Streaming.Movies Streaming.Music Unlimited.Data Contract Paperless.Billing Payment.Method Monthly.Charge Total.Charges Total.Refunds Total.Extra.Data.Charges Total.Long.Distance.Charges Total.Revenue Customer.Status Churn.Category Churn.Reason Population
90001 7590-VHVEG Female 36 Yes 0 Los Angeles 33.97362 -118.249 0 1 5 0 0.00 0 1 2 10 0 1 0 0 0 0 0 1 1 1 1 29.85 29.85 0 0 0.00 29.85 Joined 0 54492
90001 3307-TLCUD Male 36 Yes 0 Los Angeles 33.97362 -118.249 1 17 0 0 0.00 0 1 1 10 1 0 1 0 0 0 0 1 1 0 3 34.40 592.75 0 0 0.00 592.75 Stayed 0 54492
90001 0136-IFMYD Male 71 Yes 0 Los Angeles 33.97362 -118.249 1 69 0 1 18.41 1 1 3 17 0 1 1 1 1 1 0 1 3 1 1 109.95 7634.25 0 0 1270.29 8904.54 Stayed 0 54492
90001 3217-FZDMN Female 66 No 0 Los Angeles 33.97362 -118.249 0 8 5 1 5.21 0 1 3 8 0 0 1 0 1 1 0 1 1 1 2 94.45 742.95 0 0 41.68 784.63 Churned 2 Competitor had better devices 54492
90002 6625-FLENO Male 31 Yes 0 Los Angeles 33.94926 -118.247 9 58 0 1 18.68 0 0 0 0 0 0 0 0 0 0 0 0 3 0 2 -4.00 1185.95 0 0 1083.44 2269.39 Stayed 0 44586
90002 5575-GNVDE Male 46 No 0 Los Angeles 33.94926 -118.247 0 34 0 1 17.09 0 1 2 16 1 0 1 0 0 0 0 1 2 0 3 56.95 1889.50 0 0 581.06 2470.56 Stayed 0 44586

Summary

##     Zip.Code         Customer.ID      Gender          Age        Married   
##  Min.   :90001   0002-ORFBO:   1   Female:3488   Min.   :19.00   No :3641  
##  1st Qu.:92101   0003-MKNFE:   1   Male  :3555   1st Qu.:32.00   Yes:3402  
##  Median :93518   0004-TLHLJ:   1                 Median :46.00             
##  Mean   :93486   0011-IGKFF:   1                 Mean   :46.51             
##  3rd Qu.:95329   0013-EXCHZ:   1                 3rd Qu.:60.00             
##  Max.   :96150   0013-MHZWF:   1                 Max.   :80.00             
##                  (Other)   :7037                                           
##  Number.of.Dependents            City         Latitude       Longitude     
##  Min.   :0.0000       Los Angeles  : 293   Min.   :32.56   Min.   :-124.3  
##  1st Qu.:0.0000       San Diego    : 285   1st Qu.:33.99   1st Qu.:-121.8  
##  Median :0.0000       San Jose     : 112   Median :36.21   Median :-119.6  
##  Mean   :0.4687       Sacramento   : 108   Mean   :36.20   Mean   :-119.8  
##  3rd Qu.:0.0000       San Francisco: 104   3rd Qu.:38.16   3rd Qu.:-118.0  
##  Max.   :9.0000       Fresno       :  61   Max.   :41.96   Max.   :-114.2  
##                       (Other)      :6080                                   
##  Number.of.Referrals Tenure.in.Months Offer    Phone.Service
##  Min.   : 0.000      Min.   : 1.00    0:3877   0: 682       
##  1st Qu.: 0.000      1st Qu.: 9.00    1: 520   1:6361       
##  Median : 0.000      Median :29.00    2: 824                
##  Mean   : 1.952      Mean   :32.39    3: 415                
##  3rd Qu.: 3.000      3rd Qu.:55.00    4: 602                
##  Max.   :11.000      Max.   :72.00    5: 805                
##                                                             
##  Avg.Monthly.Long.Distance.Charges Multiple.Lines Internet.Service
##  Min.   : 0.00                     0:4072         0:1526          
##  1st Qu.: 9.21                     1:2971         1:5517          
##  Median :22.89                                                    
##  Mean   :22.96                                                    
##  3rd Qu.:36.40                                                    
##  Max.   :49.99                                                    
##                                                                   
##  Internet.Type Avg.Monthly.GB.Download Online.Security Online.Backup
##  1: 830        Min.   : 0.00           0:5024          0:4614       
##  2:1652        1st Qu.: 3.00           1:2019          1:2429       
##  3:3035        Median :17.00                                        
##  0:1526        Mean   :20.52                                        
##                3rd Qu.:27.00                                        
##                Max.   :85.00                                        
##                                                                     
##  Device.Protection.Plan Premium.Tech.Support Streaming.TV Streaming.Movies
##  0:4621                 0:4999               0:4336       0:4311          
##  1:2422                 1:2044               1:2707       1:2732          
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##  Streaming.Music Unlimited.Data Contract Paperless.Billing Payment.Method
##  0:4555          0:2298         1:3610   0:2872            1:3909        
##  1:2488          1:4745         2:1550   1:4171            2:2749        
##                                 3:1883                     3: 385        
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##  Monthly.Charge   Total.Charges    Total.Refunds    Total.Extra.Data.Charges
##  Min.   :-10.00   Min.   :  18.8   Min.   : 0.000   Min.   :  0.000         
##  1st Qu.: 30.40   1st Qu.: 400.1   1st Qu.: 0.000   1st Qu.:  0.000         
##  Median : 70.05   Median :1394.5   Median : 0.000   Median :  0.000         
##  Mean   : 63.60   Mean   :2280.4   Mean   : 1.962   Mean   :  6.861         
##  3rd Qu.: 89.75   3rd Qu.:3786.6   3rd Qu.: 0.000   3rd Qu.:  0.000         
##  Max.   :118.75   Max.   :8684.8   Max.   :49.790   Max.   :150.000         
##                                                                             
##  Total.Long.Distance.Charges Total.Revenue      Customer.Status Churn.Category
##  Min.   :   0.00             Min.   :   21.36   Churned:1869    1: 314        
##  1st Qu.:  70.55             1st Qu.:  605.61   Joined : 454    2: 841        
##  Median : 401.44             Median : 2108.64   Stayed :4720    3: 321        
##  Mean   : 749.10             Mean   : 3034.38                   4: 211        
##  3rd Qu.:1191.10             3rd Qu.: 4801.15                   5: 182        
##  Max.   :3564.72             Max.   :11979.34                   0:5174        
##                                                                               
##                         Churn.Reason    Population    
##                               :5174   Min.   :    11  
##  Competitor had better devices: 313   1st Qu.:  2344  
##  Competitor made better offer : 311   Median : 17554  
##  Attitude of support person   : 220   Mean   : 22140  
##  Don't know                   : 130   3rd Qu.: 36125  
##  Competitor offered more data : 117   Max.   :105285  
##  (Other)                      : 778

Structure

## 'data.frame':    7043 obs. of  39 variables:
##  $ Zip.Code                         : int  90001 90001 90001 90001 90002 90002 90002 90002 90003 90003 ...
##  $ Customer.ID                      : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 2308 86 2253 4701 3963 3129 3634 38 6795 ...
##  $ Gender                           : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 2 2 1 1 ...
##  $ Age                              : int  36 36 71 66 31 46 56 68 60 49 ...
##  $ Married                          : Factor w/ 2 levels "No","Yes": 2 2 2 1 2 1 1 1 2 2 ...
##  $ Number.of.Dependents             : int  0 0 0 0 0 0 0 0 0 3 ...
##  $ City                             : Factor w/ 1106 levels "Acampo","Acton",..: 555 555 555 555 555 555 555 555 555 555 ...
##  $ Latitude                         : num  34 34 34 34 33.9 ...
##  $ Longitude                        : num  -118 -118 -118 -118 -118 ...
##  $ Number.of.Referrals              : int  0 1 1 0 9 0 0 0 4 2 ...
##  $ Tenure.in.Months                 : int  1 17 69 8 58 34 13 38 59 3 ...
##  $ Offer                            : Factor w/ 6 levels "0","1","2","3",..: 6 1 1 6 1 1 1 1 3 1 ...
##  $ Phone.Service                    : Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 2 2 ...
##  $ Avg.Monthly.Long.Distance.Charges: num  0 0 18.41 5.21 18.68 ...
##  $ Multiple.Lines                   : Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 2 2 1 ...
##  $ Internet.Service                 : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 1 2 2 2 ...
##  $ Internet.Type                    : Factor w/ 4 levels "1","2","3","0": 2 1 3 3 4 2 4 3 3 3 ...
##  $ Avg.Monthly.GB.Download          : num  10 10 17 8 0 16 0 13 14 22 ...
##  $ Online.Security                  : Factor w/ 2 levels "0","1": 1 2 1 1 1 2 1 1 2 1 ...
##  $ Online.Backup                    : Factor w/ 2 levels "0","1": 2 1 2 1 1 1 1 1 2 2 ...
##  $ Device.Protection.Plan           : Factor w/ 2 levels "0","1": 1 2 2 2 1 2 1 1 1 1 ...
##  $ Premium.Tech.Support             : Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 2 ...
##  $ Streaming.TV                     : Factor w/ 2 levels "0","1": 1 1 2 2 1 1 1 2 2 1 ...
##  $ Streaming.Movies                 : Factor w/ 2 levels "0","1": 1 1 2 2 1 1 1 2 1 1 ...
##  $ Streaming.Music                  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Unlimited.Data                   : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 1 2 2 2 ...
##  $ Contract                         : Factor w/ 3 levels "1","2","3": 1 1 3 1 3 2 1 3 1 1 ...
##  $ Paperless.Billing                : Factor w/ 2 levels "0","1": 2 1 2 2 1 1 2 1 2 2 ...
##  $ Payment.Method                   : Factor w/ 3 levels "1","2","3": 1 3 1 2 2 3 1 1 1 1 ...
##  $ Monthly.Charge                   : num  29.9 34.4 110 94.5 -4 ...
##  $ Total.Charges                    : num  29.9 592.8 7634.2 743 1186 ...
##  $ Total.Refunds                    : num  0 0 0 0 0 ...
##  $ Total.Extra.Data.Charges         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Total.Long.Distance.Charges      : num  0 0 1270.3 41.7 1083.4 ...
##  $ Total.Revenue                    : num  29.9 592.8 8904.5 784.6 2269.4 ...
##  $ Customer.Status                  : Factor w/ 3 levels "Churned","Joined",..: 2 3 3 1 3 3 3 3 3 2 ...
##  $ Churn.Category                   : Factor w/ 6 levels "1","2","3","4",..: 6 6 6 2 6 6 6 6 6 6 ...
##  $ Churn.Reason                     : Factor w/ 21 levels "","Attitude of service provider",..: 1 1 1 4 1 1 1 1 1 1 ...
##  $ Population                       : int  54492 54492 54492 54492 44586 44586 44586 44586 58198 58198 ...

After being processed, the data seems to be pretty good for proceeding towards analysis.

5. Exploratory Data Analysis

As we’ve seen in the data structure previously, there are two major variable types. One type is categorical and the other is continuous. We can fragment the data into two types in accordance with their structure. In the exploratory data analysis, I’ll try to cover most of the factors possibly affecting customer churn.

5.1 Data fragmentation

Dataframe for categorical variable

Gender Married Offer Offer.1 Phone.Service Multiple.Lines Internet.Service Internet.Type Online.Security Online.Backup Device.Protection.Plan Premium.Tech.Support Streaming.TV Streaming.Movies Streaming.Music Unlimited.Data Contract Paperless.Billing Payment.Method Customer.Status Churn.Category
Female Yes 5 5 0 0 1 2 0 1 0 0 0 0 0 1 1 1 1 Joined 0
Male Yes 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 0 3 Stayed 0
Male Yes 0 0 1 1 1 3 0 1 1 1 1 1 0 1 3 1 1 Stayed 0
Female No 5 5 1 0 1 3 0 0 1 0 1 1 0 1 1 1 2 Churned 2
Male Yes 0 0 1 0 0 0 0 0 0 0 0 0 0 0 3 0 2 Stayed 0
Male No 0 0 1 0 1 2 1 0 1 0 0 0 0 1 2 0 3 Stayed 0

Dataframe for continous data

Age Number.of.Dependents Number.of.Referrals Tenure.in.Months Avg.Monthly.Long.Distance.Charges Avg.Monthly.GB.Download Monthly.Charge Total.Charges Total.Refunds Total.Extra.Data.Charges Total.Long.Distance.Charges Total.Revenue Customer.Status
36 0 0 1 0.00 10 29.85 29.85 0 0 0.00 29.85 Joined
36 0 1 17 0.00 10 34.40 592.75 0 0 0.00 592.75 Stayed
71 0 1 69 18.41 17 109.95 7634.25 0 0 1270.29 8904.54 Stayed
66 0 0 8 5.21 8 94.45 742.95 0 0 41.68 784.63 Churned
31 0 9 58 18.68 0 -4.00 1185.95 0 0 1083.44 2269.39 Stayed
46 0 0 34 17.09 16 56.95 1889.50 0 0 581.06 2470.56 Stayed

5.3 Analysing the Continous variables

Previously, I divided the dataset into two parts based on the variable structure. In this section, we will only acknowledge continuous variables and interpret the analysis to discover insights from the continuous dataset.

Correlation Matrix

##                                            Age Number.of.Dependents
## Age                                1.000000000         -0.118999862
## Number.of.Dependents              -0.118999862          1.000000000
## Number.of.Referrals               -0.025141149          0.278003114
## Tenure.in.Months                   0.009927002          0.108237188
## Avg.Monthly.Long.Distance.Charges -0.011748735         -0.007323974
## Avg.Monthly.GB.Download           -0.376595279          0.129965737
## Monthly.Charge                     0.134510536         -0.125649233
## Total.Charges                      0.059684033          0.022534584
## Total.Refunds                      0.024168363          0.014023411
## Total.Extra.Data.Charges           0.025036466         -0.014436102
## Total.Long.Distance.Charges        0.003065053          0.068965598
## Total.Revenue                      0.048265239          0.038037603
##                                   Number.of.Referrals Tenure.in.Months
## Age                                     -0.0251411490      0.009927002
## Number.of.Dependents                     0.2780031145      0.108237188
## Number.of.Referrals                      1.0000000000      0.326974593
## Tenure.in.Months                         0.3269745934      1.000000000
## Avg.Monthly.Long.Distance.Charges        0.0063363420      0.014596019
## Avg.Monthly.GB.Download                  0.0385748885      0.049118614
## Monthly.Charge                           0.0263011697      0.239065172
## Total.Charges                            0.2503775051      0.826073509
## Total.Refunds                            0.0247559864      0.059021391
## Total.Extra.Data.Charges                 0.0003501483      0.082265526
## Total.Long.Distance.Charges              0.2161898477      0.674149016
## Total.Revenue                            0.2618532381      0.853146275
##                                   Avg.Monthly.Long.Distance.Charges
## Age                                                    -0.011748735
## Number.of.Dependents                                   -0.007323974
## Number.of.Referrals                                     0.006336342
## Tenure.in.Months                                        0.014596019
## Avg.Monthly.Long.Distance.Charges                       1.000000000
## Avg.Monthly.GB.Download                                -0.030454544
## Monthly.Charge                                          0.130086699
## Total.Charges                                           0.069500175
## Total.Refunds                                          -0.018643863
## Total.Extra.Data.Charges                                0.002413957
## Total.Long.Distance.Charges                             0.599828291
## Total.Revenue                                           0.232291028
##                                   Avg.Monthly.GB.Download Monthly.Charge
## Age                                          -0.376595279     0.13451054
## Number.of.Dependents                          0.129965737    -0.12564923
## Number.of.Referrals                           0.038574888     0.02630117
## Tenure.in.Months                              0.049118614     0.23906517
## Avg.Monthly.Long.Distance.Charges            -0.030454544     0.13008670
## Avg.Monthly.GB.Download                       1.000000000     0.37277602
## Monthly.Charge                                0.372776022     1.00000000
## Total.Charges                                 0.223267860     0.62280977
## Total.Refunds                                 0.002396624     0.02396250
## Total.Extra.Data.Charges                      0.088938786     0.12123777
## Total.Long.Distance.Charges                   0.011372784     0.23568464
## Total.Revenue                                 0.180725995     0.56324890
##                                   Total.Charges Total.Refunds
## Age                                  0.05968403   0.024168363
## Number.of.Dependents                 0.02253458   0.014023411
## Number.of.Referrals                  0.25037751   0.024755986
## Tenure.in.Months                     0.82607351   0.059021391
## Avg.Monthly.Long.Distance.Charges    0.06950017  -0.018643863
## Avg.Monthly.GB.Download              0.22326786   0.002396624
## Monthly.Charge                       0.62280977   0.023962500
## Total.Charges                        1.00000000   0.039558484
## Total.Refunds                        0.03955848   1.000000000
## Total.Extra.Data.Charges             0.12185925   0.016755347
## Total.Long.Distance.Charges          0.61018519   0.028112592
## Total.Revenue                        0.97221211   0.036984474
##                                   Total.Extra.Data.Charges
## Age                                           0.0250364661
## Number.of.Dependents                         -0.0144361022
## Number.of.Referrals                           0.0003501483
## Tenure.in.Months                              0.0822655263
## Avg.Monthly.Long.Distance.Charges             0.0024139574
## Avg.Monthly.GB.Download                       0.0889387865
## Monthly.Charge                                0.1212377699
## Total.Charges                                 0.1218592519
## Total.Refunds                                 0.0167553474
## Total.Extra.Data.Charges                      1.0000000000
## Total.Long.Distance.Charges                   0.0588713222
## Total.Revenue                                 0.1224961391
##                                   Total.Long.Distance.Charges Total.Revenue
## Age                                               0.003065053    0.04826524
## Number.of.Dependents                              0.068965598    0.03803760
## Number.of.Referrals                               0.216189848    0.26185324
## Tenure.in.Months                                  0.674149016    0.85314628
## Avg.Monthly.Long.Distance.Charges                 0.599828291    0.23229103
## Avg.Monthly.GB.Download                           0.011372784    0.18072600
## Monthly.Charge                                    0.235684636    0.56324890
## Total.Charges                                     0.610185191    0.97221211
## Total.Refunds                                     0.028112592    0.03698447
## Total.Extra.Data.Charges                          0.058871322    0.12249614
## Total.Long.Distance.Charges                       1.000000000    0.77855873
## Total.Revenue                                     0.778558734    1.00000000

In this correlation matrix, we observe some of the variables are closely related to other variables in the dataset.

  • Total Charges ~ Tenure in months
  • Total Revenue ~ Tenure in months
  • Total Charges ~ Total Revenue
  • Total Revenue ~ Total long distance charges

Also, we see a slight negative correlation with average monthly GB download, indicating that the younger the user is, the more they use the internet, whereas older users use the internet less than the younger population.

Number of dependents vs Customer status

This graph shows that the number of dependent churned customers is significantly lower than the established customers. So, from this, we can infer that the established customers’ telecom service usage by family/relatives is higher.

Number of referals vs Customer status

This graph demonstrates that stayed customers recommend more people than the other two groups. This suggests that stayed(loyal) customers are happy with the service and frequently recommend it to other potential consumers.

Tenure in Months vs Customer status

This plot shows the duration of a user in months. The duration is the total number of months the user has been a customer of the service. Here, we can draw the inference that the churned user lasted for 18 months while the established users were customers for more than 40 months.

Monthly charges vs Customer status

This is an intriguing plot where we can see that the churned customer’s monthly charges are higher than the stayed customer’s. The question now is: why are these customers’ monthly costs high, and is this a contributing factor in their decision to change services?

Total revenue vs Customer status

The total revenue vs. customer status plot illustrates the variation in total income produced by the business through the various user categories. Given that they have been utilising the service for over 40 months compared to the churned client’s 18 months, there is a good possibility that the total money generated from the stayed customer will be the largest. Although the overall revenue from churned customers is less than that from stayed customers, we can still argue that these are customers with great revenue potential.

5.4 Analyzing the categorical variables

I’ve created a dataset for the continuous variables and developed elaborative and descriptive analysis. Likewise, in this part, I’ll be covering all the possible insights from the categorical variable.

Count of users vs Contract type

##    
##     Churned Joined Stayed
##   1    1655    408   1547
##   2     166     24   1360
##   3      48     22   1813
Variable Value Transformed value
Contract Type Month-to-Month 1
One year 2
Two year 3

Referring to the above transformed value and the original value, we can see that the churned customers are most likely to have a month-to-month contract, whereas the remaining customers are almost spread across different contract types.

Count of users availing Internet service

##    
##     Churned Joined Stayed
##   0     113    182   1231
##   1    1756    272   3489
Variable Value Transformed value
Internet Service No 0
Yes 1

The graph shows that the majority of users who switched telecom service providers did so primarily for better internet access. There’s a possibility that a competitor service provider is offering better internet service.

Internet users vs Internet type

##    
##     Churned Joined Stayed
##   1     213     56    561
##   2     307    115   1230
##   3    1236    101   1698
##   0     113    182   1231
Variable Value Transformed value
Internet Type None 0
Cable 1
DSL 2
Fiber Optic 3

The plot now illustrates that the count of churned customers with internet service is high on optic fiber. which clarifies that the customers are switching to other service providers who provide higher internet bandwidth.

Internet user with online Security

##    
##     Churned Joined Stayed
##   0    1574    408   3042
##   1     295     46   1678
Variable Value Transformed value
Online Security No 0
Yes 1

The plot shows the count of Internet users and the add-on service of online security across different categories. It gives us an idea that the number of Internet users with no online security add-on is greater than those who opt for the add-on service. We can also see that the customer column where the count of users is decent for those who opt for online security as an add-on.

Internet users with Premium tech support

##    
##     Churned Joined Stayed
##   0    1559    407   3033
##   1     310     47   1687
Variable Value Transformed value
Premium Tech Support No 0
Yes 1

This plot is very similar to the plot of Internet users with online security as an add-on. Similarly, we can infer that the Internet user without the add-on service of premium tech support is more than those who opt for the service. Also, we can refer to the stayed customer column where the count of users is decent for those who opt for the premium tech as an add-on service.

Churn reason

Variable Value Transformed value
Churn Category Blank 0
Attitude 1
Competitor 2
Dissatisfaction 3
Price 4
Others 5

This plot shows the count of churned customers and the reason for churn. From the plot, we can infer that most of the users stated competitors as the churn reason. So, we can say that most of the users are switching to competitors offering better services and higher bandwidth.

6. Preparing data for Predective Analysis

In developing predictive model the data should be split into two, one(Train dataset) for training the model and second(Test dataset) for seeing the performance of the predictive model. For consistent output, we use set.seed function. The main point of using the seed is to be able to reproduce a particular sequence of ‘random’ numbers.

7. Chi-square and Cramer’s V

Variable statistic p.value V1 Cramers_Value
Contract 1441.3859 0.0000000 3 0.3823029
Device.Protection.Plan 156.9618 0.0000000 2 0.1261579
Internet.Service 309.0694 0.0000000 2 0.1770294
Internet.Type 532.7079 0.0000000 4 0.2324139
Offer 795.4861 0.0000000 6 0.2840101
Online.Backup 138.4891 0.0000000 2 0.1185019
Online.Security 227.8757 0.0000000 2 0.1520080
Paperless.Billing 205.1861 0.0000000 2 0.1442419
Payment.Method 258.3974 0.0000000 3 0.1618682
Phone.Service 3.8192 0.1481396 2 0.0196790
Premium.Tech.Support 245.9676 0.0000000 2 0.1579270
Streaming.Movies 133.0671 0.0000000 2 0.1161590
Streaming.Music 102.2052 0.0000000 2 0.1018015
Streaming.TV 133.7242 0.0000000 2 0.1164455
Unlimited.Data 168.5999 0.0000000 2 0.1307513

In this part I’ve tried to find out the correlation and strenght(effect size of association) between categorical variables with the dependent variable(Customer.Status). The Chi-square test is a statistical tool for determining the difference between observed and expected data. It also helps in identifying the correlation between the categorical variable and the dependent variable of categorical type. While Cramer’s V helps us to measure the strength of association between the categorical variable and the dependent variable of categorical type, From the table above, with p-value < 0.05 we can say that at a significant level of 0.05, it can be concluded that the association between the variables is statistically significant, except for the Phone.Service variable, with a p-value of 0.1481. Likewise, Cramers’ V gives us the effect size measurement for the chi-square test of independence.

  • ES <= 0.2: The result is weak. Although the reult is statistically signifcant, the factor is only weakly associated.
  • 0.2 <= ES <= 0.6: The result is moderate. The factor is moderately associated.
  • ES > 0.6: The result is strong. The factor is strongly associated.

From the table we can infer that most of the variables are weakly associated while variable Contarct, Internet Type and Offer are moderately associated to the dependent variable(Customer.Status).

8. Multinomial Logistic Regression

Multinomial logistic regression is used to predict categorical placement in or the probability of category membership in a dependent variable based on multiple independent variables. The independent variables can be either dichotomous (i.e., binary) or continuous (i.e., interval or ratio in scale). Multinomial logistic regression is a simple extension of binary logistic regression that allows for more than two categories of the dependent or outcome variable. Like binary logistic regression, multinomial logistic regression uses maximum likelihood estimation to evaluate the probability of categorical membership. In the following model develepoment I’ll be using multinom function from nnet package to estimate multinomial logistic regression model. Function multinom doesn’t include p-value calculation for the regresion co-efficient, so to get p-value we’ll be using z-test(Wald test)

Model Summary

## Call:
## multinom(formula = Customer.Status ~ Age + Number.of.Dependents + 
##     Number.of.Referrals + Tenure.in.Months + Offer + Phone.Service + 
##     Avg.Monthly.Long.Distance.Charges + Multiple.Lines + Internet.Service + 
##     Internet.Type + Avg.Monthly.GB.Download + Online.Security + 
##     Online.Backup + Device.Protection.Plan + Premium.Tech.Support + 
##     Streaming.TV + Streaming.Movies + Streaming.Music + Unlimited.Data + 
##     Contract + Paperless.Billing + Payment.Method + Monthly.Charge + 
##     Total.Charges + Total.Refunds + Total.Long.Distance.Charges + 
##     Total.Revenue, data = traindf_pred2)
## 
## Coefficients:
##        (Intercept)         Age Number.of.Dependents Number.of.Referrals
## Joined   1.9603511 -0.03082223            0.3205773           0.3305210
## Stayed  -0.3169966 -0.01679600            0.5478115           0.2502277
##        Tenure.in.Months    Offer1     Offer2    Offer3     Offer4     Offer5
## Joined      -0.95900384 23.549183 10.1865275 0.5313735 -8.1349470 -0.3594837
## Stayed       0.09250368 -1.287368  0.1061509 0.2216949  0.7543357 -0.3631852
##        Phone.Service1 Avg.Monthly.Long.Distance.Charges Multiple.Lines1
## Joined      0.5212698                      -0.004649323      -0.5270837
## Stayed      0.6956961                       0.003546937      -0.1104598
##        Internet.Service1 Internet.Type2 Internet.Type3 Internet.Type0
## Joined         0.4653483      0.3908882     -0.6612113      1.4950027
## Stayed        -0.5381925      0.4673270      0.3413906      0.2211959
##        Avg.Monthly.GB.Download Online.Security1 Online.Backup1
## Joined            -0.002371780        0.5702509      0.6440871
## Stayed            -0.004288278        0.5487430      0.3263126
##        Device.Protection.Plan1 Premium.Tech.Support1 Streaming.TV1
## Joined              0.06356905             0.5755499    -0.2902495
## Stayed              0.14652193             0.5452480    -0.1623621
##        Streaming.Movies1 Streaming.Music1 Unlimited.Data1 Contract2 Contract3
## Joined         1.3570234       -1.7336383     -0.33594223  2.107196  3.630894
## Stayed         0.1798889       -0.1109168      0.02103502  1.351127  2.872753
##        Paperless.Billing1 Payment.Method2 Payment.Method3 Monthly.Charge
## Joined         -0.4922908       0.6859813      -0.5014792   -0.002071048
## Stayed         -0.2423102       0.4159249      -0.5312636   -0.011891290
##        Total.Charges Total.Refunds Total.Long.Distance.Charges Total.Revenue
## Joined  0.0031422737   0.009208220               -0.0010371218  0.0007649897
## Stayed -0.0003954708   0.006072633                0.0002451031 -0.0003175382
## 
## Std. Errors:
##        (Intercept)         Age Number.of.Dependents Number.of.Referrals
## Joined 0.001706426 0.003754653            0.0218358          0.04522931
## Stayed 0.002729437 0.002442088            0.0565553          0.02474783
##        Tenure.in.Months       Offer1       Offer2       Offer3       Offer4
## Joined      0.007100867 8.104668e-06 4.419837e-07 3.205531e-08 3.841388e-09
## Stayed      0.007684787 2.729151e-04 3.213364e-04 1.162494e-04 2.556609e-04
##              Offer5 Phone.Service1 Avg.Monthly.Long.Distance.Charges
## Joined 0.0006479004    0.001770351                       0.007799667
## Stayed 0.0010671972    0.003084193                       0.004258883
##        Multiple.Lines1 Internet.Service1 Internet.Type2 Internet.Type3
## Joined    0.0005638495       0.001570233   0.0005032142    0.001177688
## Stayed    0.0003208895       0.002722709   0.0005540282    0.003801231
##        Internet.Type0 Avg.Monthly.GB.Download Online.Security1 Online.Backup1
## Joined    0.003211291             0.004674395     0.0004801799   0.0005627926
## Stayed    0.005395436             0.002513387     0.0016159531   0.0010616302
##        Device.Protection.Plan1 Premium.Tech.Support1 Streaming.TV1
## Joined            0.0002420052          0.0004836781  0.0003142431
## Stayed            0.0009277510          0.0015734487  0.0007492829
##        Streaming.Movies1 Streaming.Music1 Unlimited.Data1    Contract2
## Joined      0.0004481977     0.0002423772     0.001871512 0.0002026849
## Stayed      0.0007094045     0.0007221277     0.003129928 0.0003659569
##           Contract3 Paperless.Billing1 Payment.Method2 Payment.Method3
## Joined 0.0002067809       0.0009340948     0.002852559    0.0002641255
## Stayed 0.0004743466       0.0031277831     0.005896004    0.0005684454
##        Monthly.Charge Total.Charges Total.Refunds Total.Long.Distance.Charges
## Joined    0.004396115   0.009263191   0.023705076                 0.009740034
## Stayed    0.002213205   0.001630616   0.006066285                 0.001645722
##        Total.Revenue
## Joined   0.009209219
## Stayed   0.001632669
## 
## Residual Deviance: 4044.068 
## AIC: 4184.068

Stayed relative to Churned :

  • Age : This is the multinomial logit estimate for one unit increase in Age for stayed customer relative to churned customer. If customer were to increase their age by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to decrease by 0.016 unit while holding all the other variable constant.
  • Number of Dependents : : This is the multinomial logit estimate for one unit increase in Number of dependents for stayed customer relative to churned customer. If customer were to increase their dependents by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to increase by 0.55 unit while holding all the other variable constant.
  • Number of referrals : : This is the multinomial logit estimate for one unit increase in Number of referrals for stayed customer relative to churned customer. If customer were to increase their referrals by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to increase by 0.25 unit while holding all the other variable constant.
  • Tenure in months : : This is the multinomial logit estimate for one unit increase in Tenure in months for stayed customer relative to churned customer. If customer were to increase their tenure by one month, the multinomial log-odds for being stayed customer to churned customer would be expected to increase by 0.10 unit while holding all the other variable constant.
  • Offer 1 : This is the multinomial logistic estimate comparing no-offer to offer1, given that other variables in the model is held constant. The multinomial logit for offer1 relative to no-offer is 1.29 unit lower for staying to churning, given all other predictors in the model are held constant. In, simple words offer1 is the less likely than no-offer to be choosen by stayed customer than churned customer.
  • Offer2 : This is the multinomial logistic estimate comparing no-offer to offer2, given that other variables in the model is held constant. The multinomial logit for offer2 relative to no-offer is 0.11 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words offer2 is the more likely than no-offer to be choosen by stayed customer than churned customer.
  • Offer3 : This is the multinomial logistic estimate comparing no-offer to offer3, given that other variables in the model is held constant. The multinomial logit for offer3 relative to no-offer is 0.22 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words offer3 is the more likely than no-offer to be choosen by stayed customer than churned customer.
  • Offer4 : This is the multinomial logistic estimate comparing no-offer to offer4, given that other variables in the model is held constant. The multinomial logit for offer4 relative to no-offer is 0.75 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words offer4 is the more likely than no-offer to be choosen by stayed customer than churned customer.
  • Offer5 : This is the multinomial logistic estimate comparing no-offer to offer5, given that other variables in the model is held constant. The multinomial logit for offer5 relative to no-offer is 0.36 unit lower for staying to churning, given all other predictors in the model are held constant. In, simple words offer5 is the less likely than no-offer to be choosen by stayed customer than churned customer.
  • Average Monthly long distance charge : This is the multinomial logit estimate for one unit increase in Average monthly long distance charge for stayed customer relative to churned customer. If customer were to increase their long distance charge by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to increase by 0.0035 unit while holding all the other variable constant.
  • Phone Service1 : This is the multinomial logistic estimate comparing Phone serice0 to Phone service1, given that other variables in the model is held constant. The multinomial logit for Phone service1 relative to Phone service0 is 0.69 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Phone Service1 is the more likely than no-Phone service to be choosen by stayed customer than churned customer.
  • Multiple Lines1 : This is the multinomial logistic estimate comparing No multiple line to Multiple line1, given that other variables in the model is held constant. The multinomial logit for Multiple line1 relative to No-multiple line is 0.11 unit lower for staying to churning, given all other predictors in the model are held constant. In, simple words Multiple Line1 is the less likely than no-Multiple line to be choosen by stayed customer than churned customer.
  • Internet Service1 : This is the multinomial logistic estimate comparing no-Internet service to Internet service1, given that other variables in the model is held constant. The multinomial logit for Internet service1 relative to no-Internet service is .54 unit lower for staying to churning, given all other predictors in the model are held constant. In, simple words Internet Service1 is the less likely than no-Internet Service to be choosen by stayed customer than churned customer.
  • Internet Type2 : This is the multinomial logistic estimate comparing Internet type1 to Internet type2 , given that other variables in the model is held constant. The multinomial logit for Internet type2 relative to Internet type1 is 0.47 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Internet Type1 is the more likely than no-offer to be choosen by stayed customer than churned customer.
  • Internet Type3 : This is the multinomial logistic estimate comparing Internet type1 to Internet type3, given that other variables in the model is held constant. The multinomial logit for Internet type3 relative to no-Internet type is 0.34 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Internet Type3 is the more likely than Internet type1 to be choosen by stayed customer than churned customer.
  • Internet Type0 : This is the multinomial logistic estimate comparing Internet type1 to No Internet, given that other variables in the model is held constant. The multinomial logit for No Internet relative to Internet type1 is 0.22 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words No Internet is the more likely than Internet type1 to be choosen by stayed customer than churned customer.
  • Average Monthly GB Download : : This is the multinomial logit estimate for one unit increase in Average monthly GB download for stayed customer relative to churned customer. If customer were to increase their download by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to decrease by 0.004 unit while holding all the other variable constant.
  • Online security1 : This is the multinomial logistic estimate comparing no-Online security to Online Security1, given that other variables in the model is held constant. The multinomial logit for Online Security1 relative to no-Online security is 0.55 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Online Security1 is the more likely than no-Online security to be choosen by stayed customer than churned customer.
  • Online Backup1 : This is the multinomial logistic estimate comparing no-Online backup to Online Backup1, given that other variables in the model is held constant. The multinomial logit for Online Backup1 relative to no-Online backup is 0.32 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Online Backup1 is the more likely than no-Online Backup to be choosen by stayed customer than churned customer.
  • Device Protection Plan1 : This is the multinomial logistic estimate comparing no-Device protection plan to Device Protection plan1, given that other variables in the model is held constant. The multinomial logit for Device Protection plan1 relative to no-Device Protection Plan is 0.15 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Fevice protection plan1 is the more likely than no-Device protection plan to be choosen by stayed customer than churned customer.
  • Premium tech support1 : This is the multinomial logistic estimate comparing no-Premium Tech Support to Premium Tech support1, given that other variables in the model is held constant. The multinomial logit for Premium Tech support1 relative to no-Premium tech support is 0.55 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Premium tech support1 is the more likely than no-Premium Tech support to be choosen by stayed customer than churned customer.
  • Streaming TV1 : This is the multinomial logistic estimate comparing no-Streaming TV to Streaming TV1, given that other variables in the model is held constant. The multinomial logit for Streaming TV1 relative to no-Streaming TV is 0.16 unit lower for staying to churning, given all other predictors in the model are held constant. In, simple words Streaming TV1 is the less likely than no-Streaming TV to be choosen by stayed customer than churned customer.
  • Streaming Movies1 : This is the multinomial logistic estimate comparing no-Streaming Movie to Streaming Movie1, given that other variables in the model is held constant. The multinomial logit for Streaming Movie1 relative to no-Streaming movie is 0.18 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Streaming Movies1 is the more likely than no-Streaming Movies to be choosen by stayed customer than churned customer.
  • Streaming Music1 : This is the multinomial logistic estimate comparing no-Streaming Music to Streaming music1, given that other variables in the model is held constant. The multinomial logit for Streaming Music1 relative to no-Streaming Music is 0.12 unit lower for staying to churning, given all other predictors in the model are held constant. In, simple words Streaming Music1 is the less likely than no-Streaming Music to be choosen by stayed customer than churned customer.
  • Contract2 : This is the multinomial logistic estimate comparing Contract1 to Contract2, given that other variables in the model is held constant. The multinomial logit for Contract2 relative to Contract1 is 1.35 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Contract2 is the more likely than Contract1 to be choosen by stayed customer than churned customer.
  • Contract3 : This is the multinomial logistic estimate comparing Contract1 to Contract3, given that other variables in the model is held constant. The multinomial logit for Contract3 relative to Contract1 is 2.87 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Contract3 is the more likely than Contract1 to be choosen by stayed customer than churned customer.
  • Paperless Billing1 :This is the multinomial logistic estimate comparing no-Paperless billing to Paperless billing1, given that other variables in the model is held constant. The multinomial logit for Paperless billing1 relative to no-Paperless billing is 0.24 unit lower for staying to churning, given all other predictors in the model are held constant. In, simple words Paperless Billing1 is the less likely than no-Parperless Billing to be choosen by stayed customer than churned customer.
  • Payment Method2 : This is the multinomial logistic estimate comparing Payment Method1 to Payment Method2, given that other variables in the model is held constant. The multinomial logit for Payment Method2 relative to Payment Method1 is 0.42 unit higher for staying to churning, given all other predictors in the model are held constant. In, simple words Payment Method2 is the more likely than Payment Method1 to be choosen by stayed customer than churned customer.
  • Payment Method3 : This is the multinomial logistic estimate comparing Payment Method1 to Payment Method3, given that other variables in the model is held constant. The multinomial logit for Payment Method3 relative to Payment Method1 is 0.53 unit lower for staying to churning, given all other predictors in the model are held constant. In, simple words Payment Method3 is the less likely than Payment Method1 to be choosen by stayed customer than churned customer.
  • Monthly Charge : This is the multinomial logit estimate for one unit increase in Monthly charge for stayed customer relative to churned customer. If customer were to increase their monthly charge by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to decrease by 0.01 unit while holding all the other variable constant.
  • Total Charges : This is the multinomial logit estimate for one unit increase in Total charges for stayed customer relative to churned customer. If customer were to increase their total charges by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to decrease by 0.0003 unit while holding all the other variable constant.
  • Total Refund : This is the multinomial logit estimate for one unit increase in Total refund for stayed customer relative to churned customer. If customer were to increase their refund amount by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to increase by 0.006 unit while holding all the other variable constant.
  • Total Long Distance Charges : This is the multinomial logit estimate for one unit increase in Long Distance Charges for stayed customer relative to churned customer. If customer were to increase their long distance charges by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to increase by 0.0002 unit while holding all the other variable constant.
  • Total Revenue : This is the multinomial logit estimate for one unit increase in Total revenue for stayed customer relative to churned customer. If customer were to increase their revenue contributed by one unit, the multinomial log-odds for being stayed customer to churned customer would be expected to decrease by 0.0003 unit while holding all the other variable constant.

2-tailed z-test(p-value)

##        (Intercept)          Age Number.of.Dependents Number.of.Referrals
## Joined           0 2.220446e-16                    0        2.717826e-13
## Stayed           0 6.081580e-12                    0        0.000000e+00
##        Tenure.in.Months Offer1 Offer2 Offer3 Offer4 Offer5 Phone.Service1
## Joined                0      0      0      0      0      0              0
## Stayed                0      0      0      0      0      0              0
##        Avg.Monthly.Long.Distance.Charges Multiple.Lines1 Internet.Service1
## Joined                         0.5511135               0                 0
## Stayed                         0.4049390               0                 0
##        Internet.Type2 Internet.Type3 Internet.Type0 Avg.Monthly.GB.Download
## Joined              0              0              0              0.61187542
## Stayed              0              0              0              0.08797553
##        Online.Security1 Online.Backup1 Device.Protection.Plan1
## Joined                0              0                       0
## Stayed                0              0                       0
##        Premium.Tech.Support1 Streaming.TV1 Streaming.Movies1 Streaming.Music1
## Joined                     0             0                 0                0
## Stayed                     0             0                 0                0
##        Unlimited.Data1 Contract2 Contract3 Paperless.Billing1 Payment.Method2
## Joined    0.000000e+00         0         0                  0               0
## Stayed    1.809686e-11         0         0                  0               0
##        Payment.Method3 Monthly.Charge Total.Charges Total.Refunds
## Joined               0   6.375632e-01     0.7344429     0.6976836
## Stayed               0   7.748749e-08     0.8083707     0.3168044
##        Total.Long.Distance.Charges Total.Revenue
## Joined                   0.9152013     0.9337976
## Stayed                   0.8816062     0.8457920

Model Fit in Test Dataset & Accuracy

Churned Joined Stayed
Churned 361 41 142
Joined 37 83 7
Stayed 160 2 1279
## [1] 81.58

Seems there’s no over-fitting, and prediction model accuracy seems pretty decent.

9. Business Recommendations

  • Contract Type

    As of previous analysis in EDA we saw that the most of the churned customers availed contract type1(i.e. month-to-month) and there were fewer customer churn in contract type2(one year) & contract type3 (two year contract). Almost 1655 customers churned who had monthly contracts while there were fewer 1547 customers who stayed in the same contract type. This means that more than half of the customers in the contract type churned. From the Cramer’s V value, we saw that the predictor contract has the highest association among all the predictors to the dependent variable. Furthermore, in Multinomial Logit Regression, I’ve explained how contracts 2 and 3 are more preferred by stayed customers than churned customers. So, from these findings, it is evident that customers are more likely to leave the company’s service when they opt for a month-to-month contract. The company should thoroughly evaluate the monthly plan and see what the other competitors are offering for the same contract type. Also, companies can devise a new contract type of bi-yearly contract which would extend for 6 months.

  • Competitor Analysis

    While addressing the reasons for churn in EDA, we saw that most of the churned customers stated competitors as the main reason for their churn. There’s a need for elaborate competitor analysis, mainly in the pricing and in the Intenet type (focusing mainly on bandwidth offered). Because the majority of Internet users preferred Optic Fiber as the Internet type, it is critical to investigate what competitors are offering in terms of Internet type and bandwidth.

  • Internet type

    In EDA, we saw Internet users and Internet types in all three customer status categories. We saw that the count of the churned customers with internet service is high on optic fiber, which implies that most of the churned customers with internet service are switching to other service providers seeking higher internet bandwidth. Also, in Cramer’s V table, we saw that internet type is moderately associated with customer status, which signifies that the customer’s status is moderately affected by the type of internet type the customer is using. The telecom provider can survey their existing customers and gather additional information on the bandwidth of their company compared to other telecom providers in their locality. If the internet bandwidth is less than that of other competitors, it can try to improve the bandwidth.

  • Monthly Charges and Add-on Services

    In EDA of monthly charges vs. customer status, we saw that there are high monthly charges for churned customers. The probable reason for this can be an overpriced monthly contract or a competitor’s competitive price for a monthly contract. Some of the add-on services, such as premium tech support and online security, also have some features that reduce the churning of customers (this is explained in co-efficient explanation). Previously, I recommended that we initiate formulating a new bi-yearly(6 month) contract. This contract type should be bundled with add-on services such as Premium Tech support and Online Security so that it looks more attractive.

    Thank you for your time :)