Churn with respect to a company refers to contractual customer base. It is an important factor for any business with a subscriber-based service model, including mobile telephone networks. In this paper, churn is the factor of how many people ended the=ir subscriptions with the company in the previous month.
Churn is an important factor for a company as it displays the image of the company’s service quality and customer satisfaction. It also displays the strength the company holds in the market.
This paper investigates the factors the govern the churn in a telecommunications company and analyses the their affects on churn.
Our field of study concerns the churn in a telecommunications company and how it is affected by various factors such as gender, senior citizenship, partnership, dependentship, various services subscribed for, contracts types, monthly and total charges.
Our analysis of churn in the company throws light oon how companies can hold their place in the market and hoow they can benefit from the factors affecting churn.
Our specific objective is to determine what factors affect the churn in a company most and how they affect it.
It is important as no company wants their customer base to be in flux and dilemma. All the companis want market monopoly which is ossible only by reducing the churn of the company.
This paper provides reasonable insight of churn is affected by differnt variables.
Hypothesis H1: The churn in the company depends on the gender of its customers, their age range, their status of partnership, their status of dependentship, various services they have subscribed for, their contracts types, and their total charges.
For this study, we collected data from the IBM training website. The data can be collected from (https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Fn-UseC_-Telco-Customer-Churn.csv?cm_mc_uid=60666461180615162966768&cm_mc_sid_50200000=1517419332&cm_mc_sid_52640000=1517419332). This dataset contains of all the information about the customers including their customer ID, their gender, their age range, their status of partnership, their status of dependentship, various services they have subscribed for, their contracts types, their payment methods and their total charges.
In order to test Hypothesis 1a, we proposed the following model:
Model: \[Churn = \beta_0 + \beta_1TotalCharges + \beta_2tenure + \beta_3gender + \beta_4PhoneService + \beta_5SeniorCitizen + \beta_6MultipleLines + \beta_7InternetService + \beta_8Contract + \epsilon\]
This analysis has to be done using the logistic regression analysis as the variable churn whose dependency is to analysed is a factor variable.
# reading data in the file
ibm.df<-read.csv(paste("IBM.csv"),)
#dividing the dataset for testing purposes
train <- ibm.df[1:6950,]
test <- ibm.df[6951:7043,]
# logistic regression
model=glm(Churn~TotalCharges+tenure+gender+PhoneService+SeniorCitizen+MultipleLines+InternetService+Contract,data = train,family = binomial)
summary(model)
##
## Call:
## glm(formula = Churn ~ TotalCharges + tenure + gender + PhoneService +
## SeniorCitizen + MultipleLines + InternetService + Contract,
## family = binomial, data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7307 -0.7093 -0.3050 0.8173 3.5096
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.483e-01 1.234e-01 2.821 0.004786 **
## TotalCharges 3.120e-04 6.316e-05 4.939 7.86e-07 ***
## tenure -6.203e-02 5.885e-03 -10.540 < 2e-16 ***
## genderMale -1.477e-02 6.417e-02 -0.230 0.817903
## PhoneServiceYes -7.805e-01 1.295e-01 -6.026 1.68e-09 ***
## SeniorCitizenYes 3.299e-01 8.173e-02 4.036 5.43e-05 ***
## MultipleLinesYes 3.037e-01 7.862e-02 3.863 0.000112 ***
## InternetServiceFiber optic 1.083e+00 9.291e-02 11.653 < 2e-16 ***
## InternetServiceNo -7.178e-01 1.277e-01 -5.619 1.92e-08 ***
## ContractOne year -8.096e-01 1.050e-01 -7.708 1.28e-14 ***
## ContractTwo year -1.675e+00 1.727e-01 -9.700 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8029.0 on 6938 degrees of freedom
## Residual deviance: 5888.3 on 6928 degrees of freedom
## (11 observations deleted due to missingness)
## AIC: 5910.3
##
## Number of Fisher Scoring iterations: 6
We established the effect of Total charges, phone Services, internet services atc. on churn with the simplest model. We estimated model, using logistic regression.
We found empirical support for H1. The odds for churn decreases as tenure increases. Similarly the odds for churn decreases as contract increases. The subscription for phone service results in the decrease in the ods of churn. The results can be inferred similarly from the regression above.
This paper was motivated by the need for research that could improve our understanding of how churn of a company is influenced by various factors related to the customers. We observe that odds churn can be reduced by increasing contracts with the customes, by catching on the customers for phone service subscriptions. Its important for the company to reduce the charge of its service as more charge promotes churn.
# Summarize the Data
library(psych)
describe(ibm.df)
vars n mean sd median trimmed mad min
customerID* 1 7043 3522.00 2033.28 3522.00 3522.00 2610.86 1.00
gender* 2 7043 1.50 0.50 2.00 1.51 0.00 1.00
SeniorCitizen* 3 7043 1.16 0.37 1.00 1.08 0.00 1.00
Partner* 4 7043 1.48 0.50 1.00 1.48 0.00 1.00
Dependents* 5 7043 1.30 0.46 1.00 1.25 0.00 1.00
tenure 6 7043 32.37 24.56 29.00 31.43 32.62 0.00
PhoneService* 7 7043 1.90 0.30 2.00 2.00 0.00 1.00
MultipleLines* 8 7043 1.42 0.49 1.00 1.40 0.00 1.00
InternetService* 9 7043 1.87 0.74 2.00 1.84 1.48 1.00
OnlineSecurity* 10 7043 1.29 0.45 1.00 1.23 0.00 1.00
OnlineBackup* 11 7043 1.34 0.48 1.00 1.31 0.00 1.00
DeviceProtection* 12 7043 1.34 0.48 1.00 1.30 0.00 1.00
TechSupport* 13 7043 1.29 0.45 1.00 1.24 0.00 1.00
StreamingTV* 14 7043 1.38 0.49 1.00 1.36 0.00 1.00
StreamingMovies* 15 7043 1.39 0.49 1.00 1.36 0.00 1.00
Contract* 16 7043 1.69 0.83 1.00 1.61 0.00 1.00
PaperlessBilling* 17 7043 1.59 0.49 2.00 1.62 0.00 1.00
PaymentMethod* 18 7043 2.57 1.07 3.00 2.59 1.48 1.00
MonthlyCharges 19 7043 64.76 30.09 70.35 64.97 35.66 18.25
TotalCharges 20 7032 2283.30 2266.77 1397.47 1970.14 1812.92 18.80
Churn* 21 7043 1.27 0.44 1.00 1.21 0.00 1.00
max range skew kurtosis se
customerID* 7043.00 7042.0 0.00 -1.20 24.23
gender* 2.00 1.0 -0.02 -2.00 0.01
SeniorCitizen* 2.00 1.0 1.83 1.36 0.00
Partner* 2.00 1.0 0.07 -2.00 0.01
Dependents* 2.00 1.0 0.87 -1.23 0.01
tenure 72.00 72.0 0.24 -1.39 0.29
PhoneService* 2.00 1.0 -2.73 5.43 0.00
MultipleLines* 2.00 1.0 0.32 -1.90 0.01
InternetService* 3.00 2.0 0.21 -1.15 0.01
OnlineSecurity* 2.00 1.0 0.94 -1.11 0.01
OnlineBackup* 2.00 1.0 0.65 -1.57 0.01
DeviceProtection* 2.00 1.0 0.66 -1.57 0.01
TechSupport* 2.00 1.0 0.92 -1.15 0.01
StreamingTV* 2.00 1.0 0.48 -1.77 0.01
StreamingMovies* 2.00 1.0 0.46 -1.79 0.01
Contract* 3.00 2.0 0.63 -1.27 0.01
PaperlessBilling* 2.00 1.0 -0.38 -1.86 0.01
PaymentMethod* 4.00 3.0 -0.17 -1.21 0.01
MonthlyCharges 118.75 100.5 -0.22 -1.26 0.36
TotalCharges 8684.80 8666.0 0.96 -0.23 27.03
Churn* 2.00 1.0 1.06 -0.87 0.01
gender2<-xtabs(~ibm.df$Churn+ibm.df$gender)
gender2
## ibm.df$gender
## ibm.df$Churn Female Male
## No 2549 2625
## Yes 939 930
sc2<-xtabs(~ibm.df$Churn+ibm.df$SeniorCitizen)
sc2
## ibm.df$SeniorCitizen
## ibm.df$Churn No Yes
## No 4508 666
## Yes 1393 476
boxplot(ibm.df$tenure,horizontal = TRUE, main="Tenure of Subscribers",xlab="Months",col="grey")
hist(ibm.df$tenure,breaks = 30,main = "Frequency of Tenure Months",xlab = "Tenure",col="grey")
Average Tenure of all the Subscribers w.r.t churn
aggregate(tenure~Churn,data = ibm.df,mean)
## Churn tenure
## 1 No 37.56997
## 2 Yes 17.97913
xtabs(~ibm.df$Churn+ibm.df$PhoneService)
## ibm.df$PhoneService
## ibm.df$Churn No Yes
## No 512 4662
## Yes 170 1699
xtabs(~ibm.df$Churn+ibm.df$MultipleLines)
## ibm.df$MultipleLines
## ibm.df$Churn No Yes
## No 3053 2121
## Yes 1019 850
xtabs(~ibm.df$Churn+ibm.df$InternetService)
## ibm.df$InternetService
## ibm.df$Churn DSL Fiber optic No
## No 1962 1799 1413
## Yes 459 1297 113
xtabs(~ibm.df$Churn+ibm.df$Contract)
## ibm.df$Contract
## ibm.df$Churn Month-to-month One year Two year
## No 2220 1307 1647
## Yes 1655 166 48
boxplot(ibm.df$TotalCharges,horizontal = TRUE, main="Total Charges of Subscribers",xlab="Amount",col="grey")
hist(ibm.df$TotalCharges,breaks = 30,main = "Frequency of Total Charges ",xlab = "Amount",col="grey")
Average Total Charge of all the Subscribers w.r.t churn
aggregate(TotalCharges~Churn,data = ibm.df,mean)
## Churn TotalCharges
## 1 No 2555.344
## 2 Yes 1531.796
library(coefplot)
coefplot(model, intercept=FALSE)
scatterplot(MonthlyCharges~tenure|Churn,data = ibm.df,cex=0.5)
scatterplot(TotalCharges~tenure|Churn,data = ibm.df,cex=0.5)
aggregate(tenure~Contract,data = ibm.df,mean)
## Contract tenure
## 1 Month-to-month 18.03665
## 2 One year 42.04481
## 3 Two year 56.73510
boxplot(tenure~Contract,data = ibm.df,horizontal=TRUE,col="grey",main="Variation of Tenure with Contracts",xlab="Tenure")
THE END