R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Note: this analysis was performed using the open source software R and Rstudio.

Introduction

Customer churn occurs when customers or subscribers stop doing business with a company or service, also known as customer attrition. It is also referred as loss of clients or customers. The industry in which churn rates are particularly useful may include the telecommunications industry, the healthcare industry, the insurance industry, etc. because most customers have multiple options from which to choose within a geographic location.

In this lab, we are going to analyze customer churn using telecom dataset available from Kaggle. We will introduce some basic exploratory analysis to help you understand the nature of customer churn.

What is customer churn?

Imagine that you have 10 customers who switch to your competitors (e.g., cancel your service’s subscription) and you originally signed 100 customers during that time period, your churn rate would be:

churn <- 10
original <- 100
Churn_rate <- churn/original
# How much is the Churn rate? See the result below: 
Churn_rate
## [1] 0.1

In the industries listed above, the average churn rate is about 20% to 40%. Reducing your churn rate is important because your customers are critical for your business growth (Salesmate, 2021).

Case study - We will be using a real world dataset available in Kaggle for this tutorial (see the reference)

churn <- read.csv('WA_Fn-UseC_-Telco-Customer-Churn.csv', na.strings = c('','?'))
#install.packages("DT")
#install.packages("cowplot")
#install.packages(c("ggplot2","plyr"),dep=T)
#error  - The following `from` values were not present in `x`
#solution - install plyr using the function above

library(plyr)
library(ggplot2)
library(DT)
library(cowplot)
datatable(churn, rownames = FALSE, filter="top", options = list(pageLength = 6, scrollX= T))
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
#we use the sapply function to remove all missing values
sapply(churn, function(x) sum(is.na(x)))
##       customerID           gender    SeniorCitizen          Partner 
##                0                0                0                0 
##       Dependents           tenure     PhoneService    MultipleLines 
##                0                0                0                0 
##  InternetService   OnlineSecurity     OnlineBackup DeviceProtection 
##                0                0                0                0 
##      TechSupport      StreamingTV  StreamingMovies         Contract 
##                0                0                0                0 
## PaperlessBilling    PaymentMethod   MonthlyCharges     TotalCharges 
##                0                0                0               11 
##            Churn 
##                0
#Exploratory analysis
head(churn)
##   customerID gender SeniorCitizen Partner Dependents tenure PhoneService
## 1 7590-VHVEG Female             0     Yes         No      1           No
## 2 5575-GNVDE   Male             0      No         No     34          Yes
## 3 3668-QPYBK   Male             0      No         No      2          Yes
## 4 7795-CFOCW   Male             0      No         No     45           No
## 5 9237-HQITU Female             0      No         No      2          Yes
## 6 9305-CDSKC Female             0      No         No      8          Yes
##      MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection
## 1 No phone service             DSL             No          Yes               No
## 2               No             DSL            Yes           No              Yes
## 3               No             DSL            Yes          Yes               No
## 4 No phone service             DSL            Yes           No              Yes
## 5               No     Fiber optic             No           No               No
## 6              Yes     Fiber optic             No           No              Yes
##   TechSupport StreamingTV StreamingMovies       Contract PaperlessBilling
## 1          No          No              No Month-to-month              Yes
## 2          No          No              No       One year               No
## 3          No          No              No Month-to-month              Yes
## 4         Yes          No              No       One year               No
## 5          No          No              No Month-to-month              Yes
## 6          No         Yes             Yes Month-to-month              Yes
##               PaymentMethod MonthlyCharges TotalCharges Churn
## 1          Electronic check          29.85        29.85    No
## 2              Mailed check          56.95      1889.50    No
## 3              Mailed check          53.85       108.15   Yes
## 4 Bank transfer (automatic)          42.30      1840.75    No
## 5          Electronic check          70.70       151.65   Yes
## 6          Electronic check          99.65       820.50   Yes
summary(churn)
##   customerID           gender          SeniorCitizen      Partner         
##  Length:7043        Length:7043        Min.   :0.0000   Length:7043       
##  Class :character   Class :character   1st Qu.:0.0000   Class :character  
##  Mode  :character   Mode  :character   Median :0.0000   Mode  :character  
##                                        Mean   :0.1621                     
##                                        3rd Qu.:0.0000                     
##                                        Max.   :1.0000                     
##                                                                           
##   Dependents            tenure      PhoneService       MultipleLines     
##  Length:7043        Min.   : 0.00   Length:7043        Length:7043       
##  Class :character   1st Qu.: 9.00   Class :character   Class :character  
##  Mode  :character   Median :29.00   Mode  :character   Mode  :character  
##                     Mean   :32.37                                        
##                     3rd Qu.:55.00                                        
##                     Max.   :72.00                                        
##                                                                          
##  InternetService    OnlineSecurity     OnlineBackup       DeviceProtection  
##  Length:7043        Length:7043        Length:7043        Length:7043       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  TechSupport        StreamingTV        StreamingMovies      Contract        
##  Length:7043        Length:7043        Length:7043        Length:7043       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  PaperlessBilling   PaymentMethod      MonthlyCharges    TotalCharges   
##  Length:7043        Length:7043        Min.   : 18.25   Min.   :  18.8  
##  Class :character   Class :character   1st Qu.: 35.50   1st Qu.: 401.4  
##  Mode  :character   Mode  :character   Median : 70.35   Median :1397.5  
##                                        Mean   : 64.76   Mean   :2283.3  
##                                        3rd Qu.: 89.85   3rd Qu.:3794.7  
##                                        Max.   :118.75   Max.   :8684.8  
##                                                         NA's   :11      
##     Churn          
##  Length:7043       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
bi1 <- ggplot(data = churn, aes(x = factor(Churn), y = tenure, fill = Churn)) +geom_boxplot()
#Explanation - High churn for those with lower tenures

bi2 <- ggplot(data = churn, aes(x = factor(Churn), y = MonthlyCharges, fill = Churn))+geom_boxplot()
#Explanation - Higher churn for those with higher than avg monthly charges

bi3 <- ggplot(data = churn, aes(x = factor(Churn), y = TotalCharges, fill = Churn))+geom_boxplot()
#Explanation - Lower churn for those with lower than avg total charges

plot_grid(bi1, bi2, bi3, labels = "AUTO")
## Warning: Removed 11 rows containing non-finite values (stat_boxplot).

#Partner plot

p1 <- ggplot(churn, aes(x = Partner, fill = Partner)) +
  geom_bar() +
  geom_text(aes(y = ..count.. -200, 
                label = paste0(round(prop.table(..count..),4) * 100, '%')), 
            stat = 'count', 
            position = position_dodge(.1), 
            size = 3)
## using default colors - specify the fill argument within the aes function to be equal to the grouping variable of Partner

#Dependents plot
p2 <- ggplot(churn, aes(x = Dependents, fill = Dependents)) +
  geom_bar() +
  geom_text(aes(y = ..count.. -200, 
                label = paste0(round(prop.table(..count..),4) * 100, '%')), 
            stat = 'count', 
            position = position_dodge(.1), 
            size = 3)

#install.packages("plyr")



#Senior citizen plot

churn$SeniorCitizen <- as.factor(mapvalues(churn$SeniorCitizen,
                                           from=c("0","1"),
                                           to=c("No", "Yes")))
p3 <- ggplot(churn, aes(x = SeniorCitizen, fill = SeniorCitizen)) +
  geom_bar() +
  geom_text(aes(y = ..count.. -200, 
                label = paste0(round(prop.table(..count..),4) * 100, '%')), 
            stat = 'count', 
            position = position_dodge(.1), 
            size = 3)
plot_grid(p1, p2, p3, labels = "AUTO")

Appendix - Variable explanations

Please find the variable explanations below:

https://www.kaggle.com/blastchar/telco-customer-churn

The raw data contains 7043 rows (customers) and 21 columns (features). customerID gender (female, male) SeniorCitizen (Whether the customer is a senior citizen or not (1, 0)) Partner (Whether the customer has a partner or not (Yes, No)) Dependents (Whether the customer has dependents or not (Yes, No)) tenure (Number of months the customer has stayed with the company) PhoneService (Whether the customer has a phone service or not (Yes, No)) MultipleLines (Whether the customer has multiple lines r not (Yes, No, No phone service) InternetService (Customer’s internet service provider (DSL, Fiber optic, No) OnlineSecurity (Whether the customer has online security or not (Yes, No, No internet service) OnlineBackup (Whether the customer has online backup or not (Yes, No, No internet service) DeviceProtection (Whether the customer has device protection or not (Yes, No, No internet service) TechSupport (Whether the customer has tech support or not (Yes, No, No internet service) streamingTV (Whether the customer has streaming TV or not (Yes, No, No internet service) streamingMovies (Whether the customer has streaming movies or not (Yes, No, No internet service) Contract (The contract term of the customer (Month-to-month, One year, Two year) PaperlessBilling (Whether the customer has paperless billing or not (Yes, No)) PaymentMethod (The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))) MonthlyCharges (The amount charged to the customer monthly — numeric) TotalCharges (The total amount charged to the customer — numeric) Churn ( Whether the customer churned or not (Yes or No)).

References

Data: telco-customer-churn, https://www.kaggle.com/blastchar/telco-customer-churn

Salesmate, 2021. An All-inclusive Guide to SaaS Churn: https://www.salesmate.io/wp-content/uploads/2019/01/Whitepaper-SaaS-Churn.pdf