Project Title: Using Customer Behavior Data to Improve Customer Retention

NAME: ASWATHY GUNADEEP

EMAIL: aswathygunadeep@gmail.com

COLLEGE / COMPANY: NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

A telecommunications company is concerned about the number of customers leaving their landline business for cable competitors. They need to understand who is leaving.

The data set includes information about:

PRELIMINARY WORK

Reading the raw data into a dataframe

setwd("D:/desktop/Data Analytics internship-sameer mathur/project")
ash.df <- read.csv(paste("WA_Fn-UseC_-Telco-Customer-Churn.csv", sep=""))
attach(ash.df)
dim(ash.df)
## [1] 7043   21
View(ash.df)
str(ash.df)
## 'data.frame':    7043 obs. of  21 variables:
##  $ customerID      : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
##  $ gender          : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
##  $ SeniorCitizen   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Partner         : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
##  $ Dependents      : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
##  $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
##  $ PhoneService    : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
##  $ MultipleLines   : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
##  $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
##  $ OnlineSecurity  : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
##  $ OnlineBackup    : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
##  $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
##  $ TechSupport     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
##  $ StreamingTV     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
##  $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
##  $ Contract        : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
##  $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
##  $ PaymentMethod   : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
##  $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
##  $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
##  $ Churn           : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...
library(psych)
describe(ash.df)
##                   vars    n    mean      sd  median trimmed     mad   min
## customerID*          1 7043 3522.00 2033.28 3522.00 3522.00 2610.86  1.00
## gender*              2 7043    1.50    0.50    2.00    1.51    0.00  1.00
## SeniorCitizen        3 7043    0.16    0.37    0.00    0.08    0.00  0.00
## Partner*             4 7043    1.48    0.50    1.00    1.48    0.00  1.00
## Dependents*          5 7043    1.30    0.46    1.00    1.25    0.00  1.00
## tenure               6 7043   32.37   24.56   29.00   31.43   32.62  0.00
## PhoneService*        7 7043    1.90    0.30    2.00    2.00    0.00  1.00
## MultipleLines*       8 7043    1.94    0.95    2.00    1.93    1.48  1.00
## InternetService*     9 7043    1.87    0.74    2.00    1.84    1.48  1.00
## OnlineSecurity*     10 7043    1.79    0.86    2.00    1.74    1.48  1.00
## OnlineBackup*       11 7043    1.91    0.88    2.00    1.88    1.48  1.00
## DeviceProtection*   12 7043    1.90    0.88    2.00    1.88    1.48  1.00
## TechSupport*        13 7043    1.80    0.86    2.00    1.75    1.48  1.00
## StreamingTV*        14 7043    1.99    0.89    2.00    1.98    1.48  1.00
## StreamingMovies*    15 7043    1.99    0.89    2.00    1.99    1.48  1.00
## Contract*           16 7043    1.69    0.83    1.00    1.61    0.00  1.00
## PaperlessBilling*   17 7043    1.59    0.49    2.00    1.62    0.00  1.00
## PaymentMethod*      18 7043    2.57    1.07    3.00    2.59    1.48  1.00
## MonthlyCharges      19 7043   64.76   30.09   70.35   64.97   35.66 18.25
## TotalCharges        20 7032 2283.30 2266.77 1397.47 1970.14 1812.92 18.80
## Churn*              21 7043    1.27    0.44    1.00    1.21    0.00  1.00
##                       max  range  skew kurtosis    se
## customerID*       7043.00 7042.0  0.00    -1.20 24.23
## gender*              2.00    1.0 -0.02    -2.00  0.01
## SeniorCitizen        1.00    1.0  1.83     1.36  0.00
## Partner*             2.00    1.0  0.07    -2.00  0.01
## Dependents*          2.00    1.0  0.87    -1.23  0.01
## tenure              72.00   72.0  0.24    -1.39  0.29
## PhoneService*        2.00    1.0 -2.73     5.43  0.00
## MultipleLines*       3.00    2.0  0.12    -1.88  0.01
## InternetService*     3.00    2.0  0.21    -1.15  0.01
## OnlineSecurity*      3.00    2.0  0.42    -1.52  0.01
## OnlineBackup*        3.00    2.0  0.18    -1.68  0.01
## DeviceProtection*    3.00    2.0  0.19    -1.68  0.01
## TechSupport*         3.00    2.0  0.40    -1.54  0.01
## StreamingTV*         3.00    2.0  0.03    -1.72  0.01
## StreamingMovies*     3.00    2.0  0.01    -1.72  0.01
## Contract*            3.00    2.0  0.63    -1.27  0.01
## PaperlessBilling*    2.00    1.0 -0.38    -1.86  0.01
## PaymentMethod*       4.00    3.0 -0.17    -1.21  0.01
## MonthlyCharges     118.75  100.5 -0.22    -1.26  0.36
## TotalCharges      8684.80 8666.0  0.96    -0.23 27.03
## Churn*               2.00    1.0  1.06    -0.87  0.01
library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
some(ash.df)
##      customerID gender SeniorCitizen Partner Dependents tenure
## 359  9057-MSWCO   Male             1     Yes         No     27
## 1136 1779-PWPMG Female             1     Yes         No     72
## 1210 8268-YDIXR   Male             0     Yes         No     56
## 1673 4737-AQCPU   Male             0     Yes        Yes     72
## 2156 0524-IAVZO Female             0     Yes         No     30
## 2577 1074-AMIOH Female             0     Yes        Yes     53
## 3312 6137-MFAJN Female             0      No         No     48
## 3995 0406-BPDVR Female             1     Yes         No     54
## 5169 6000-APYLU   Male             0      No         No      9
## 5888 4316-XCSLJ   Male             0      No        Yes     17
##      PhoneService    MultipleLines InternetService OnlineSecurity
## 359            No No phone service             DSL            Yes
## 1136          Yes              Yes     Fiber optic            Yes
## 1210          Yes              Yes     Fiber optic            Yes
## 1673          Yes              Yes             DSL            Yes
## 2156          Yes              Yes     Fiber optic             No
## 2577          Yes               No     Fiber optic             No
## 3312           No No phone service             DSL            Yes
## 3995          Yes              Yes     Fiber optic             No
## 5169          Yes               No     Fiber optic             No
## 5888          Yes               No             DSL            Yes
##      OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies
## 359            No               No          No          No              No
## 1136          Yes              Yes         Yes         Yes             Yes
## 1210          Yes              Yes          No          No              No
## 1673          Yes              Yes         Yes          No              No
## 2156           No               No          No          No             Yes
## 2577          Yes              Yes         Yes         Yes             Yes
## 3312          Yes               No          No         Yes              No
## 3995          Yes               No          No         Yes             Yes
## 5169           No               No          No          No             Yes
## 5888           No               No          No          No              No
##            Contract PaperlessBilling             PaymentMethod
## 359  Month-to-month               No   Credit card (automatic)
## 1136       Two year              Yes Bank transfer (automatic)
## 1210       One year              Yes          Electronic check
## 1673       Two year               No   Credit card (automatic)
## 2156 Month-to-month              Yes          Electronic check
## 2577 Month-to-month              Yes Bank transfer (automatic)
## 3312 Month-to-month              Yes          Electronic check
## 3995       One year              Yes   Credit card (automatic)
## 5169 Month-to-month              Yes          Electronic check
## 5888 Month-to-month              Yes              Mailed check
##      MonthlyCharges TotalCharges Churn
## 359           30.75       805.10   Yes
## 1136         114.65      8333.95    No
## 1210          93.15      5253.95    No
## 1673          72.10      5016.65    No
## 2156          85.00      2624.25   Yes
## 2577         108.25      5935.10    No
## 3312          44.80      2104.55    No
## 3995         101.50      5373.10   Yes
## 5169          80.80       713.10   Yes
## 5888          50.30       846.80    No

VISUALIZATION

pie(table(ash.df$Churn),main="churn", col=c("skyblue","yellow"))

  1. Which customers are likely to leave?

The obvious guess is Total charges and Monthly charges and tenure.To make sure of this, we plot the corrgram, create correlation matrix and scatterplot matrix , excluding the the less important factor variables , having 2 levels-“Yes” and “No”.

CORRGRAM

library(corrgram)
par(mfrow=c(1,1))
corrgram(ash.df, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of dataset")

SCATTERPLOTMATRIX

scatterplotMatrix(~TotalCharges+MonthlyCharges+tenure | Churn, data=ash.df,main="scatterplotmatrix")

CORRELATION MATRIX

den.df <- subset(ash.df[,c(6,19,20)], ash.df$TotalCharges>=0 && ash.df$MonthlyCharges>=0)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
## 
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
## 
##     describe
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
data <- cor(den.df,use = "complete.obs")
round(data,2)
##                tenure MonthlyCharges TotalCharges
## tenure           1.00           0.25         0.83
## MonthlyCharges   0.25           1.00         0.65
## TotalCharges     0.83           0.65         1.00

The key drivers are tenure, contract, internet service, total charges and monthly charges.

  1. tenure drives churn?
par(mfrow=c(1,2))
d2 <- density(ash.df$tenure)
plot(d2, main="tenure")
polygon(d2, col="red", border="blue")
histogram(~ tenure | Churn,col="green")

This visualization shows that customers who have been customers for shorter periods are more likely to leave.

  1. Role of total charges and monthly charges in retention.
xyplot(TotalCharges ~ MonthlyCharges, data=den.df, main="total charges v/s monthly charges", type = c("p", "g"),xlab="monthly charges", ylab="total charges")

par(mfrow=c(1,2))
den.df <- subset(ash.df, ash.df$TotalCharges>=0)
d <- density(den.df$TotalCharges)
plot(d, main="Total charges")
polygon(d, col="red", border="blue")
den1.df <- subset(ash.df, ash.df$MonthlyCharges>=0)
d1 <- density(den1.df$MonthlyCharges)
plot(d1, main="Monthly charges")
polygon(d1, col="red", border="blue")

par(mfrow=c(2,1))
library(lattice)
histogram(~ TotalCharges | Churn, data=ash.df)

histogram(~ MonthlyCharges | Churn, data=ash.df, col="orange")

  1. role of contract
library(vcd)
## Loading required package: grid
tab1 <- xtabs(~ Churn + Contract)
tab1
##      Contract
## Churn Month-to-month One year Two year
##   No            2220     1307     1647
##   Yes           1655      166       48
mosaic(tab1, shade=TRUE, legend=TRUE, main="contract basis")

  1. role of internet service
tab2 <- xtabs(~ Churn + InternetService)
tab2
##      InternetService
## Churn  DSL Fiber optic   No
##   No  1962        1799 1413
##   Yes  459        1297  113
mosaic(tab2, shade=TRUE, legend=TRUE, main="internet service effects")

boxplot(TotalCharges ~ InternetService, data=den.df,main="total charges with type of internet service", col=c("skyblue","orange","green"),xlab="type of internet service",ylab="total charges")

library(ggplot2)
qplot(x = TotalCharges, y = tenure, data = den.df, color = factor(Churn))

boxplot(tenure ~ Contract, data=ash.df,main="tenure v/s contract", col=c("red","orange","yellow"),xlab="type of contract",ylab="tenure")

boxplot(tenure ~ OnlineSecurity, data=ash.df,main="tenure v/s online security", col=c("burlywood","darkolivegreen"),xlab="online security",ylab="tenure")

From the above analysis and visualizations, customers who leave tend to be ones who are on a month-to-month contract, have fiber optic internet service, and have been customers for shorter periods.

CHI-SQUARED TESTS:

chisq.test(Churn,TotalCharges)
## Warning in chisq.test(Churn, TotalCharges): Chi-squared approximation may
## be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and TotalCharges
## X-squared = 6503.6, df = 6529, p-value = 0.5859
chisq.test(Churn,MonthlyCharges)
## Warning in chisq.test(Churn, MonthlyCharges): Chi-squared approximation may
## be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and MonthlyCharges
## X-squared = 2123.6, df = 1584, p-value < 2.2e-16
chisq.test(Churn,tenure)
## Warning in chisq.test(Churn, tenure): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and tenure
## X-squared = 1065.3, df = 72, p-value < 2.2e-16
chisq.test(Churn,OnlineSecurity)
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and OnlineSecurity
## X-squared = 850, df = 2, p-value < 2.2e-16
chisq.test(Churn,Contract)
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and Contract
## X-squared = 1184.6, df = 2, p-value < 2.2e-16

Hence, since all the p-values are very less (except for totalcharges), our original hypothesis is correct.