Introduction

The idea is just study logistic regression modeling with what I am learning in this course. We can understand what is logistic regression here and here in chapter 5. The dataset is here

As I read here, churn customer happens when the customer stops doing business or negotiate with a company.

Loading packages

library(plyr)
library(corrplot)

## corrplot 0.84 loaded

library(ggplot2)
library(gridExtra)
library(ggthemes)
library(caret)

## Loading required package: lattice

Loading dataset

Loading the dataset to an object and learning about its structure with str:

churn <- read.csv('Telco-Customer-Churn.csv')
str(churn)

## 'data.frame':    7043 obs. of  21 variables:
##  $ customerID      : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
##  $ gender          : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
##  $ SeniorCitizen   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Partner         : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
##  $ Dependents      : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
##  $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
##  $ PhoneService    : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
##  $ MultipleLines   : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
##  $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
##  $ OnlineSecurity  : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
##  $ OnlineBackup    : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
##  $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
##  $ TechSupport     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
##  $ StreamingTV     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
##  $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
##  $ Contract        : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
##  $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
##  $ PaymentMethod   : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
##  $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
##  $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
##  $ Churn           : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...

Our dataset has 7043 registers of clients and 21 different variables each one. The Churn column is our variable of interest. We want to know how our variables affects the churn column.

To start our analisys we search for missing values in each column with sapply function:

sapply(churn, function(x) sum(is.na(x)))

##       customerID           gender    SeniorCitizen          Partner 
##                0                0                0                0 
##       Dependents           tenure     PhoneService    MultipleLines 
##                0                0                0                0 
##  InternetService   OnlineSecurity     OnlineBackup DeviceProtection 
##                0                0                0                0 
##      TechSupport      StreamingTV  StreamingMovies         Contract 
##                0                0                0                0 
## PaperlessBilling    PaymentMethod   MonthlyCharges     TotalCharges 
##                0                0                0               11 
##            Churn 
##                0

We have 11 missing values at TotalCharges column. It’s quite a few number if we compare with the number of registers in our dataset and, because of this, we could simply throw those 11 registers away. Well… I never did a missing value treatment different than that, so I’ll try something new. Searching for “R packages to treat missing values” on internet I found this page.

First, I’ll replace with mean and median.

library(Hmisc)

## Loading required package: survival

## 
## Attaching package: 'survival'

## The following object is masked from 'package:caret':
## 
##     cluster

## Loading required package: Formula

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:plyr':
## 
##     is.discrete, summarize

## The following objects are masked from 'package:base':
## 
##     format.pval, units

objOriginal = churn$TotalCharges
objMean =  impute(objOriginal, mean)
objMedian = impute(objOriginal, median)

Now, let’s put the objMean and objMedian impute objects in the churn dataframe.

objOriginal = churn
objOriginal$TotalCharges = objMean
objMean = objOriginal

objOriginal$TotalCharges = objMedian

objMedian = objOriginal
remove('objOriginal')

Let’s try kNN imputation from the DMwR package.

anyNA(churn[,!names(churn) %in% 'custumerID'])

## [1] TRUE

(See? I didn’t throw off the NA values)

library(DMwR)

## Loading required package: grid

## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## 
## Attaching package: 'DMwR'

## The following object is masked from 'package:plyr':
## 
##     join

objkNN = knnImputation(churn[,!names(churn) %in% 'custumerID'])
anyNA(objkNN)

## [1] FALSE

Now let’s see which are the maximum and minimum time someone was using the company’s services:

max(churn$tenure); min(churn$tenure)

## [1] 72

## [1] 0

Creating a function to group customers in classes by period of assignment:

groupTenure <- function(tenure){
  if (tenure >= 0 & tenure <= 6){
    return('0 up to 6 months')
    }else if(tenure > 6 & tenure <= 12){
    return('6 up to 12 months')
  }else if (tenure > 12 & tenure <= 18){
    return(' 12 up to 18 months')
  }else if (tenure > 18 & tenure <= 24){
    return(' 18 up to 24 months')
  }else if (tenure > 24 & tenure <= 30){
    return(' 24 up to 30 months')
  }else if (tenure > 30 & tenure <= 36){
    return(' 30 up to 36 months')
  }else if (tenure > 36 & tenure <= 42){
    return(' 36 up to 42 months')
  }else if (tenure > 42 & tenure <= 48){
    return(' 42 up to 48 months')
  }else if (tenure > 48 & tenure <= 54){
    return(' 48 up to 54 months')
  }else if (tenure > 54 & tenure <= 60){
    return(' 54 up to 60 months')
  }else if (tenure > 60){
    return(' More than 60 months')
  }
}

Now we apply the previous function on our objects (and clean the missing values in churn object).

churn$tenure_group =
  sapply(churn$tenure,groupTenure)

churn$tenure_group =
  as.factor(churn$tenure_group)
churn = na.omit(churn)

Choosing a better name to some values:

str(churn)

## 'data.frame':    7032 obs. of  22 variables:
##  $ customerID      : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
##  $ gender          : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
##  $ SeniorCitizen   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Partner         : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
##  $ Dependents      : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
##  $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
##  $ PhoneService    : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
##  $ MultipleLines   : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
##  $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
##  $ OnlineSecurity  : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
##  $ OnlineBackup    : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
##  $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
##  $ TechSupport     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
##  $ StreamingTV     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
##  $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
##  $ Contract        : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
##  $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
##  $ PaymentMethod   : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
##  $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
##  $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
##  $ Churn           : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...
##  $ tenure_group    : Factor w/ 11 levels " 12 up to 18 months",..: 10 4 10 6 10 11 2 11 3 9 ...
##  - attr(*, "na.action")= 'omit' Named int  489 754 937 1083 1341 3332 3827 4381 5219 6671 ...
##   ..- attr(*, "names")= chr  "489" "754" "937" "1083" ...

cols_recode1 <- c(10:15)
for(i in 1:ncol(churn[,cols_recode1])){
  churn[,cols_recode1][,i] <- as.factor(mapvalues
                              (churn[,cols_recode1][,i],
                                from =c("No internet service"),
                                to=c("No")))
}

churn$MultipleLines <- as.factor(mapvalues(churn$MultipleLines, 
                                           from=c("No phone service"),
                                           to=c("No")))

We will not use the columns custumerID and tenure, so we remove them from all objects:

churn$customerID = NULL
churn$tenure = NULL

objkNN$customerID = NULL
objkNN$tenure = NULL

objMean$customerID = NULL
objMean$tenure = NULL

objMedian$customerID = NULL
objMedian$tenure = NULL

Exploratory Data Analysis

Let’s see the correlation between numerical variables:

numeric.var = sapply(churn, is.numeric)
corr.matrix = cor(churn[,numeric.var])
corrplot(corr.matrix, main="\nCorrelation between variables with churn object.", method="number")

Repeating the process process but with objkNN:

numerickNN.var = sapply(objkNN, is.numeric)
corrkNN.matrix = cor(objkNN[,numeric.var])
corrplot(corrkNN.matrix, main="\nCorrelation between variables in objkNN", method="number")

…Now with objMean:

numericMean.var = sapply(objMean, is.numeric)
corrMean.matrix = cor(objMean[,numeric.var])
corrplot(corrMean.matrix, main="\nCorrelation between variables in objMean", method="number")

…And objMedian:

numericMedian.var = sapply(objMedian, is.numeric)
corrMedian.matrix = cor(objMedian[,numeric.var])
corrplot(corrMean.matrix, main="\nCorrelation between variables in objMedian", method="number")

So we can see that those 11 missing values has no practical effect on any correlation. So I’ll throw away those objects I created with missing values replaced.

remove('objkNN','objMean','objMedian', 'numerickNN.var','numericMean.var','numericMedian.var')

In churn object We can remove one of those two variables to avoid overfitting.

churn$MonthlyCharges = NULL

Looking at frequencies distribution of categorical values:

p1 <- ggplot(churn, aes(x=gender)) + ggtitle("Gender") + xlab("Sexo") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p2 <- ggplot(churn, aes(x=SeniorCitizen)) + ggtitle("Senior Citizen") + xlab("Senior Citizen") + 
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p3 <- ggplot(churn, aes(x=Partner)) + ggtitle("Partner") + xlab("Parceiros") + 
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p4 <- ggplot(churn, aes(x=Dependents)) + ggtitle("Dependents") + xlab("Dependentes") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()
grid.arrange(p1, p2, p3, p4, ncol=2)

p5 <- ggplot(churn, aes(x=PhoneService)) + ggtitle("Phone Service") + xlab("Telefonia") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p6 <- ggplot(churn, aes(x=MultipleLines)) + ggtitle("Multiple Lines") + xlab("Múltiplas Linhas") + 
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p7 <- ggplot(churn, aes(x=InternetService)) + ggtitle("Internet Service") + xlab("Internet Service") + 
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p8 <- ggplot(churn, aes(x=OnlineSecurity)) + ggtitle("Online Security") + xlab("Online Security") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()
grid.arrange(p5, p6, p7, p8, ncol=2)

p9 <- ggplot(churn, aes(x=OnlineBackup)) + ggtitle("Online Backup") + xlab("Online Backup") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p10 <- ggplot(churn, aes(x=DeviceProtection)) + ggtitle("Device Protection") + xlab("Device Protection") + 
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p11 <- ggplot(churn, aes(x=TechSupport)) + ggtitle("Tech Support") + xlab("Tech Support") + 
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p12 <- ggplot(churn, aes(x=StreamingTV)) + ggtitle("Streaming TV") + xlab("Streaming TV") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()
grid.arrange(p9, p10, p11, p12, ncol=2)

p13 <- ggplot(churn, aes(x=StreamingMovies)) + ggtitle("Streaming Movies") + xlab("Streaming Movies") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p14 <- ggplot(churn, aes(x=Contract)) + ggtitle("Contract") + xlab("Contract") + 
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p15 <- ggplot(churn, aes(x=PaperlessBilling)) + ggtitle("Paperless Billing") + xlab("Paperless Billing") + 
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p16 <- ggplot(churn, aes(x=PaymentMethod)) + ggtitle("Payment Method") + xlab("Payment Method") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()

p17 <- ggplot(churn, aes(x=tenure_group)) + ggtitle("Tenure Group") + xlab("Tenure Group") +
  geom_bar(aes(y = 100*(..count..)/sum(..count..)), width = 0.5) + ylab("Percentual") + coord_flip() + theme_minimal()
grid.arrange(p13, p14, p15, p16, p17, ncol=2)

We can keep all variables because they look well spreaded.

Now we will do the logistic regression.

Logistic regression

Normal object (churn)

Let’s partitionate 75% of our data and train the model. The remaining data we will use to test our model.

train = createDataPartition(churn$Churn,p=0.75,list=FALSE)
set.seed(4602232) # Hit your head on numerical keyboard.
toTrain = churn[train,]
toTest = churn[-train,]

Look at the dimensions of test and trained dataset to learn if they makes sense:

print('toTrain')

## [1] "toTrain"

dim(toTrain)

## [1] 5275   19

print('toTest')

## [1] "toTest"

dim(toTest)

## [1] 1757   19

They do.

Training logistic model

logModel <- glm(Churn ~ ., family=binomial(link="logit"), data=toTrain)
print(summary(logModel))

## 
## Call:
## glm(formula = Churn ~ ., family = binomial(link = "logit"), data = toTrain)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1040  -0.6865  -0.3013   0.6421   3.1900  
## 
## Coefficients:
##                                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                          -1.054e+00  2.093e-01  -5.036 4.75e-07 ***
## genderMale                           -2.507e-02  7.517e-02  -0.334 0.738726    
## SeniorCitizen                         2.124e-01  9.762e-02   2.175 0.029603 *  
## PartnerYes                            7.430e-03  9.029e-02   0.082 0.934412    
## DependentsYes                        -2.139e-01  1.038e-01  -2.060 0.039396 *  
## PhoneServiceYes                      -3.858e-01  1.536e-01  -2.512 0.012006 *  
## MultipleLinesYes                      2.061e-01  9.309e-02   2.214 0.026813 *  
## InternetServiceFiber optic            9.016e-01  1.158e-01   7.789 6.73e-15 ***
## InternetServiceNo                    -7.127e-01  1.585e-01  -4.496 6.91e-06 ***
## OnlineSecurityYes                    -2.644e-01  9.825e-02  -2.691 0.007120 ** 
## OnlineBackupYes                      -1.891e-01  9.011e-02  -2.099 0.035843 *  
## DeviceProtectionYes                   4.938e-02  9.228e-02   0.535 0.592551    
## TechSupportYes                       -3.937e-01  1.015e-01  -3.877 0.000106 ***
## StreamingTVYes                        3.772e-01  9.608e-02   3.926 8.65e-05 ***
## StreamingMoviesYes                    1.981e-01  9.613e-02   2.061 0.039308 *  
## ContractOne year                     -6.545e-01  1.231e-01  -5.318 1.05e-07 ***
## ContractTwo year                     -1.500e+00  2.085e-01  -7.195 6.23e-13 ***
## PaperlessBillingYes                   4.054e-01  8.640e-02   4.692 2.70e-06 ***
## PaymentMethodCredit card (automatic) -7.329e-02  1.306e-01  -0.561 0.574631    
## PaymentMethodElectronic check         3.016e-01  1.092e-01   2.761 0.005756 ** 
## PaymentMethodMailed check            -9.920e-02  1.344e-01  -0.738 0.460522    
## TotalCharges                          3.043e-05  7.829e-05   0.389 0.697548    
## tenure_group 18 up to 24 months      -3.485e-01  1.856e-01  -1.878 0.060418 .  
## tenure_group 24 up to 30 months      -6.007e-01  2.021e-01  -2.973 0.002951 ** 
## tenure_group 30 up to 36 months      -7.180e-01  2.241e-01  -3.204 0.001353 ** 
## tenure_group 36 up to 42 months      -6.436e-01  2.477e-01  -2.598 0.009375 ** 
## tenure_group 42 up to 48 months      -7.792e-01  2.881e-01  -2.704 0.006848 ** 
## tenure_group 48 up to 54 months      -6.664e-01  3.088e-01  -2.158 0.030931 *  
## tenure_group 54 up to 60 months      -1.339e+00  3.605e-01  -3.713 0.000205 ***
## tenure_group More than 60 months     -1.349e+00  4.122e-01  -3.272 0.001067 ** 
## tenure_group0 up to 6 months          9.497e-01  1.484e-01   6.399 1.57e-10 ***
## tenure_group6 up to 12 months         1.307e-01  1.597e-01   0.819 0.413002    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6108.6  on 5274  degrees of freedom
## Residual deviance: 4381.4  on 5243  degrees of freedom
## AIC: 4445.4
## 
## Number of Fisher Scoring iterations: 6

Let’s see the ANOVA to know if the means of all variables (specially the categorical ones) has some significance among others. We can do this with ANOVA.

anova(logModel, test="Chisq")

All cathegories with Pr(>Chi) with order like 10^{-16} are pretty important. But the ones with greater deviance values affects more. The variables that can be negotiated/changed are the kind of Contract and the Internet Service, TotalCharges.

Let’s see the acuracy of our model:

toTest$Churn = as.character(toTest$Churn)
toTest$Churn[toTest$Churn=="No"] = "0"
toTest$Churn[toTest$Churn=="Yes"] = "1"
fitted.results = predict(logModel,newdata=toTest,type='response')
fitted.results = ifelse(fitted.results > 0.5,1,0)
misClasificError = mean(fitted.results != toTest$Churn)
print(paste('Logistic Regression Accuracy',1-misClasificError))

## [1] "Logistic Regression Accuracy 0.815594763801935"

Almost 80% of accuracy.

Confusion matrix

We can use the confusion matrix to measure the quality of our model:

print("Confusion matrix for logistic regression"); table(toTest$Churn, fitted.results > 0.5)

## [1] "Confusion matrix for logistic regression"

##    
##     FALSE TRUE
##   0  1170  120
##   1   204  263

We got 1154 true negatives and 250 true positives. This are when our model predicted the answers rightly. We can also see that our model is good to predict the no churn, that is, he is good to see if someone doesn’t cancels the service. And we have 217 false positives and 137 false negatives. This means our model predicted TRUE in 217 cases where the right answer was FALSE and 136 cases were predicted FALSE but they were TRUE

Odds Ratio

We can learn about Odds Ratio here. The following table will help us to decide where we can act to increase the possibility of a custumer maintain the service.

exp(cbind(OR=coef(logModel), confint(logModel)))

## Waiting for profiling to be done...

##                                             OR     2.5 %    97.5 %
## (Intercept)                          0.3485666 0.2307466 0.5242559
## genderMale                           0.9752398 0.8416353 1.1300991
## SeniorCitizen                        1.2365814 1.0210363 1.4971488
## PartnerYes                           1.0074578 0.8441333 1.2026877
## DependentsYes                        0.8074661 0.6582844 0.9890197
## PhoneServiceYes                      0.6799054 0.5033471 0.9193521
## MultipleLinesYes                     1.2289132 1.0241439 1.4752990
## InternetServiceFiber optic           2.4636094 1.9653659 3.0942886
## InternetServiceNo                    0.4903098 0.3584525 0.6675243
## OnlineSecurityYes                    0.7676591 0.6328103 0.9302488
## OnlineBackupYes                      0.8276907 0.6936626 0.9876248
## DeviceProtectionYes                  1.0506202 0.8769309 1.2592031
## TechSupportYes                       0.6745761 0.5524084 0.8226205
## StreamingTVYes                       1.4581823 1.2083908 1.7612629
## StreamingMoviesYes                   1.2191095 1.0099790 1.4723371
## ContractOne year                     0.5196883 0.4073263 0.6600465
## ContractTwo year                     0.2231120 0.1465528 0.3323294
## PaperlessBillingYes                  1.4999345 1.2667420 1.7775122
## PaymentMethodCredit card (automatic) 0.9293270 0.7192402 1.2003352
## PaymentMethodElectronic check        1.3520197 1.0922798 1.6762933
## PaymentMethodMailed check            0.9055658 0.6959494 1.1789348
## TotalCharges                         1.0000304 0.9998792 1.0001863
## tenure_group 18 up to 24 months      0.7057661 0.4898064 1.0142472
## tenure_group 24 up to 30 months      0.5484371 0.3680269 0.8130121
## tenure_group 30 up to 36 months      0.4877258 0.3131262 0.7540744
## tenure_group 36 up to 42 months      0.5253859 0.3215786 0.8497552
## tenure_group 42 up to 48 months      0.4587850 0.2587000 0.8011292
## tenure_group 48 up to 54 months      0.5135498 0.2778589 0.9330851
## tenure_group 54 up to 60 months      0.2621997 0.1277620 0.5254882
## tenure_group More than 60 months     0.2595718 0.1137324 0.5727452
## tenure_group0 up to 6 months         2.5849738 1.9351021 3.4631910
## tenure_group6 up to 12 months        1.1396405 0.8335383 1.5590831

If we offer an automatic credit card payment method, we decrease in 2.47% the chances of our custumer leaves. And for each unity we increase in TotalCharges we increase in 2.7% the chances of a custumer cancel a service.

Application of Logistic Regression in Customer Churn Analytics

Leandro Kellermann de Oliveira

31/12/2019