Customer churn is the percentage of customers that stopped using a company’s product or service during a certain time. In this page, we will learn how to analyze customer churn using Rmarkdown. This dataset is one of the publicity available datasets from Telco Customer Churn at the following link : https://www.kaggle.com/blastchar/telco-customer-churn
Load Data :
## customerID gender SeniorCitizen Partner Dependents tenure PhoneService
## 1 7590-VHVEG Female 0 Yes No 1 No
## 2 5575-GNVDE Male 0 No No 34 Yes
## 3 3668-QPYBK Male 0 No No 2 Yes
## 4 7795-CFOCW Male 0 No No 45 No
## 5 9237-HQITU Female 0 No No 2 Yes
## 6 9305-CDSKC Female 0 No No 8 Yes
## MultipleLines InternetService OnlineSecurity OnlineBackup
## 1 No phone service DSL No Yes
## 2 No DSL Yes No
## 3 No DSL Yes Yes
## 4 No phone service DSL Yes No
## 5 No Fiber optic No No
## 6 Yes Fiber optic No No
## DeviceProtection TechSupport StreamingTV StreamingMovies Contract
## 1 No No No No Month-to-month
## 2 Yes No No No One year
## 3 No No No No Month-to-month
## 4 Yes Yes No No One year
## 5 No No No No Month-to-month
## 6 Yes No Yes Yes Month-to-month
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
## 1 Yes Electronic check 29.85 29.85
## 2 No Mailed check 56.95 1889.50
## 3 Yes Mailed check 53.85 108.15
## 4 No Bank transfer (automatic) 42.30 1840.75
## 5 Yes Electronic check 70.70 151.65
## 6 Yes Electronic check 99.65 820.50
## Churn
## 1 No
## 2 No
## 3 Yes
## 4 No
## 5 Yes
## 6 Yes
Since we are going to analyze contract and payment method. We need to transformed them into numeric types.
telco[, c( "Contract", "PaymentMethod")] <- lapply(telco[, c( "Contract", "PaymentMethod")], as.numeric)
head(telco)## customerID gender SeniorCitizen Partner Dependents tenure PhoneService
## 1 7590-VHVEG Female 0 Yes No 1 No
## 2 5575-GNVDE Male 0 No No 34 Yes
## 3 3668-QPYBK Male 0 No No 2 Yes
## 4 7795-CFOCW Male 0 No No 45 No
## 5 9237-HQITU Female 0 No No 2 Yes
## 6 9305-CDSKC Female 0 No No 8 Yes
## MultipleLines InternetService OnlineSecurity OnlineBackup
## 1 No phone service DSL No Yes
## 2 No DSL Yes No
## 3 No DSL Yes Yes
## 4 No phone service DSL Yes No
## 5 No Fiber optic No No
## 6 Yes Fiber optic No No
## DeviceProtection TechSupport StreamingTV StreamingMovies Contract
## 1 No No No No 1
## 2 Yes No No No 2
## 3 No No No No 1
## 4 Yes Yes No No 2
## 5 No No No No 1
## 6 Yes No Yes Yes 1
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
## 1 Yes 3 29.85 29.85 No
## 2 No 4 56.95 1889.50 No
## 3 Yes 4 53.85 108.15 Yes
## 4 No 1 42.30 1840.75 No
## 5 Yes 3 70.70 151.65 Yes
## 6 Yes 3 99.65 820.50 Yes
First, let to know how many customers churn in this data :
## No Yes
## 5174 1869
From the output , we can see that is about 27% of the customers stopped using Telco services.
We start looking at the percentage by contract and payment method, by using aggregate we can summarise contract and payment method into data frame.
temp1 <- aggregate.data.frame(list("Contract" = telco$Contract, "Payment_Method" = telco$PaymentMethod), list(telco$Churn), mean)
temp1## Group.1 Contract Payment_Method
## 1 No 1.889254 2.505605
## 2 Yes 1.140182 2.764580
Running these library for data visualization :
## Loading required package: magrittr
By using melt we can convert an Object into a molten data frame.
plot1 <- ggplot(telco_melt, aes(reorder(Group.1, value), value))+
geom_col(aes(fill= variable), position = "dodge")+
labs(title = "Customer Churn vs Contract & Payment Method", subtitle = "Telco Customer Churn", x = "\n\n\n\n Customer Churn", y = "Percentage")+
theme(legend.position = "bottom", axis.text.x = element_text(hjust = 0.5, angle = 90) )+
facet_grid(~variable)
plot1As we can notice : Yes Customer Churn of Payment Method has higher percentage than No Customer Churn of Payment Method, while Yes Customer Churn of Internet Service has lower than No Customer Churn of Internet Service. This suggest that those customers stopped using Telco’s service is because of the payment method. By creating a different or new payment method, we can better understand customers to make getting paid easier.
We are going to make a new variable by using aggregate function.
temp2 <- aggregate.data.frame(list("Monthly_Charges" = telco$MonthlyCharges, "Total_Charges" = telco$TotalCharges), list(telco$PaymentMethod), mean)
temp2## Group.1 Monthly_Charges Total_Charges
## 1 1 67.19265 NA
## 2 2 66.51239 NA
## 3 3 76.25581 2090.868
## 4 4 43.91706 NA
When our data has NA types, we need to change that data into zero.
## [1] FALSE
## customerID gender SeniorCitizen Partner
## 0 0 0 0
## Dependents tenure PhoneService MultipleLines
## 0 0 0 0
## InternetService OnlineSecurity OnlineBackup DeviceProtection
## 0 0 0 0
## TechSupport StreamingTV StreamingMovies Contract
## 0 0 0 0
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
## 0 0 0 0
## Churn
## 0
temp2 <- aggregate.data.frame(list("Monthly_Charges" = telco$MonthlyCharges, "Total_Charges" = telco$TotalCharges), list("Payment_Method" = telco$PaymentMethod), mean)
temp2## Payment_Method Monthly_Charges Total_Charges
## 1 1 67.19265 3075.311
## 2 2 66.51239 3069.378
## 3 3 76.25581 2090.868
## 4 4 43.91706 1049.251
Remember if we want to convert an object, use melt function
Let’s make the data visualization
plot2 <- ggplot(telco_melt2, aes(Payment_Method, value), value)+
geom_point(aes(size = value, fill = variable, color = variable), alpha=0.5)+
labs(title = "The Most Popular Payment Method ", subtitle = "Telco Customer", x = "\n\n\n\n Payment Method", y = "Value")+
theme(legend.position = "bottom", axis.text.x = element_text(hjust = 0.5, angle = 90) )
plot2As we can see : The most payment method that customer used is Bank Transfer Automatic. We can suggest to marketing team to make a promotion on their payment method using this method.