This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Note: this analysis was performed using the open source software R and Rstudio.
Customer churn occurs when customers or subscribers stop doing business with a company or service, also known as customer attrition. It is also referred as loss of clients or customers. The industry in which churn rates are particularly useful may include the telecommunications industry, the healthcare industry, the insurance industry, etc. because most customers have multiple options from which to choose within a geographic location.
In this lab, we are going to analyze customer churn using telecom dataset available from Kaggle. We will introduce some basic exploratory analysis to help you understand the nature of customer churn.
Imagine that you have 10 customers who switch to your competitors (e.g., cancel your service’s subscription) and you originally signed 100 customers during that time period, your churn rate would be:
churn <- 10
original <- 100
Churn_rate <- churn/original
# How much is the Churn rate? See the result below:
Churn_rate
## [1] 0.1
In the industries listed above, the average churn rate is about 20% to 40%. Reducing your churn rate is important because your customers are critical for your business growth (Salesmate, 2021).
churn <- read.csv('WA_Fn-UseC_-Telco-Customer-Churn.csv', na.strings = c('','?'))
#install.packages("DT")
#install.packages("cowplot")
#install.packages(c("ggplot2","plyr"),dep=T)
#error - The following `from` values were not present in `x`
#solution - install plyr using the function above
library(plyr)
library(ggplot2)
library(DT)
library(cowplot)
datatable(churn, rownames = FALSE, filter="top", options = list(pageLength = 6, scrollX= T))
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
#we use the sapply function to remove all missing values
sapply(churn, function(x) sum(is.na(x)))
## customerID gender SeniorCitizen Partner
## 0 0 0 0
## Dependents tenure PhoneService MultipleLines
## 0 0 0 0
## InternetService OnlineSecurity OnlineBackup DeviceProtection
## 0 0 0 0
## TechSupport StreamingTV StreamingMovies Contract
## 0 0 0 0
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
## 0 0 0 11
## Churn
## 0
#Exploratory analysis
head(churn)
## customerID gender SeniorCitizen Partner Dependents tenure PhoneService
## 1 7590-VHVEG Female 0 Yes No 1 No
## 2 5575-GNVDE Male 0 No No 34 Yes
## 3 3668-QPYBK Male 0 No No 2 Yes
## 4 7795-CFOCW Male 0 No No 45 No
## 5 9237-HQITU Female 0 No No 2 Yes
## 6 9305-CDSKC Female 0 No No 8 Yes
## MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection
## 1 No phone service DSL No Yes No
## 2 No DSL Yes No Yes
## 3 No DSL Yes Yes No
## 4 No phone service DSL Yes No Yes
## 5 No Fiber optic No No No
## 6 Yes Fiber optic No No Yes
## TechSupport StreamingTV StreamingMovies Contract PaperlessBilling
## 1 No No No Month-to-month Yes
## 2 No No No One year No
## 3 No No No Month-to-month Yes
## 4 Yes No No One year No
## 5 No No No Month-to-month Yes
## 6 No Yes Yes Month-to-month Yes
## PaymentMethod MonthlyCharges TotalCharges Churn
## 1 Electronic check 29.85 29.85 No
## 2 Mailed check 56.95 1889.50 No
## 3 Mailed check 53.85 108.15 Yes
## 4 Bank transfer (automatic) 42.30 1840.75 No
## 5 Electronic check 70.70 151.65 Yes
## 6 Electronic check 99.65 820.50 Yes
summary(churn)
## customerID gender SeniorCitizen Partner
## Length:7043 Length:7043 Min. :0.0000 Length:7043
## Class :character Class :character 1st Qu.:0.0000 Class :character
## Mode :character Mode :character Median :0.0000 Mode :character
## Mean :0.1621
## 3rd Qu.:0.0000
## Max. :1.0000
##
## Dependents tenure PhoneService MultipleLines
## Length:7043 Min. : 0.00 Length:7043 Length:7043
## Class :character 1st Qu.: 9.00 Class :character Class :character
## Mode :character Median :29.00 Mode :character Mode :character
## Mean :32.37
## 3rd Qu.:55.00
## Max. :72.00
##
## InternetService OnlineSecurity OnlineBackup DeviceProtection
## Length:7043 Length:7043 Length:7043 Length:7043
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## TechSupport StreamingTV StreamingMovies Contract
## Length:7043 Length:7043 Length:7043 Length:7043
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
## Length:7043 Length:7043 Min. : 18.25 Min. : 18.8
## Class :character Class :character 1st Qu.: 35.50 1st Qu.: 401.4
## Mode :character Mode :character Median : 70.35 Median :1397.5
## Mean : 64.76 Mean :2283.3
## 3rd Qu.: 89.85 3rd Qu.:3794.7
## Max. :118.75 Max. :8684.8
## NA's :11
## Churn
## Length:7043
## Class :character
## Mode :character
##
##
##
##
bi1 <- ggplot(data = churn, aes(x = factor(Churn), y = tenure, fill = Churn)) +geom_boxplot()
#Explanation - High churn for those with lower tenures
bi2 <- ggplot(data = churn, aes(x = factor(Churn), y = MonthlyCharges, fill = Churn))+geom_boxplot()
#Explanation - Higher churn for those with higher than avg monthly charges
bi3 <- ggplot(data = churn, aes(x = factor(Churn), y = TotalCharges, fill = Churn))+geom_boxplot()
#Explanation - Lower churn for those with lower than avg total charges
plot_grid(bi1, bi2, bi3, labels = "AUTO")
## Warning: Removed 11 rows containing non-finite values (stat_boxplot).
#Partner plot
p1 <- ggplot(churn, aes(x = Partner, fill = Partner)) +
geom_bar() +
geom_text(aes(y = ..count.. -200,
label = paste0(round(prop.table(..count..),4) * 100, '%')),
stat = 'count',
position = position_dodge(.1),
size = 3)
## using default colors - specify the fill argument within the aes function to be equal to the grouping variable of Partner
#Dependents plot
p2 <- ggplot(churn, aes(x = Dependents, fill = Dependents)) +
geom_bar() +
geom_text(aes(y = ..count.. -200,
label = paste0(round(prop.table(..count..),4) * 100, '%')),
stat = 'count',
position = position_dodge(.1),
size = 3)
#install.packages("plyr")
#Senior citizen plot
churn$SeniorCitizen <- as.factor(mapvalues(churn$SeniorCitizen,
from=c("0","1"),
to=c("No", "Yes")))
p3 <- ggplot(churn, aes(x = SeniorCitizen, fill = SeniorCitizen)) +
geom_bar() +
geom_text(aes(y = ..count.. -200,
label = paste0(round(prop.table(..count..),4) * 100, '%')),
stat = 'count',
position = position_dodge(.1),
size = 3)
plot_grid(p1, p2, p3, labels = "AUTO")
Please find the variable explanations below:
https://www.kaggle.com/blastchar/telco-customer-churn
The raw data contains 7043 rows (customers) and 21 columns (features). customerID gender (female, male) SeniorCitizen (Whether the customer is a senior citizen or not (1, 0)) Partner (Whether the customer has a partner or not (Yes, No)) Dependents (Whether the customer has dependents or not (Yes, No)) tenure (Number of months the customer has stayed with the company) PhoneService (Whether the customer has a phone service or not (Yes, No)) MultipleLines (Whether the customer has multiple lines r not (Yes, No, No phone service) InternetService (Customer’s internet service provider (DSL, Fiber optic, No) OnlineSecurity (Whether the customer has online security or not (Yes, No, No internet service) OnlineBackup (Whether the customer has online backup or not (Yes, No, No internet service) DeviceProtection (Whether the customer has device protection or not (Yes, No, No internet service) TechSupport (Whether the customer has tech support or not (Yes, No, No internet service) streamingTV (Whether the customer has streaming TV or not (Yes, No, No internet service) streamingMovies (Whether the customer has streaming movies or not (Yes, No, No internet service) Contract (The contract term of the customer (Month-to-month, One year, Two year) PaperlessBilling (Whether the customer has paperless billing or not (Yes, No)) PaymentMethod (The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))) MonthlyCharges (The amount charged to the customer monthly — numeric) TotalCharges (The total amount charged to the customer — numeric) Churn ( Whether the customer churned or not (Yes or No)).
Data: telco-customer-churn, https://www.kaggle.com/blastchar/telco-customer-churn
Salesmate, 2021. An All-inclusive Guide to SaaS Churn: https://www.salesmate.io/wp-content/uploads/2019/01/Whitepaper-SaaS-Churn.pdf