knitr::opts_chunk$set(echo = TRUE,fig.width=8,fig.height=3)#set plot size
library(tidyverse);library(ggplot2);library(readxl);
library(gridExtra)#for putting plots side by side
library(broom);library(pander)#for making table
Wall thinks the estimation of probability that a customer churns and the detection of the most significant reason leading to churn enable his team to approach to the customer, improve QWE’s service and prevent the customer from churn without giving costly discounts. Wall’s problem is predicting a customer’s leaving probability before February 2012 based on the two months data of characteristics of customers before.
V.J. Aggarwal generates a list of the 100 customers who are most likely to churn and analyzes three factors that matter the most to churn. He pulled out data from November to December, 2011. He chose the variables ‘Customer age’, ‘CHI score’, ‘numbers of support cases’, ‘average support priority’ and ‘usage information’ based on Wall’s intuition. He also added a column recording whether the customer left in the two months after December 1. They keep an eye on the customers using QWE service between 6 and 14 months. They target customers with a dropping CHI because customers unsatisfied with the service are more likely to churn. They pick service and usage pattern related variables. Because customers with frequent service issues or high priority may have problems with QWE’s service and then churn. Besides, a customer active in the QWE’s services probably enjoy the QWE’s service and won’t churn.
There are two main data restrictions. Firstly, they choose the variables based on their intuitions. Some important factors might be omitted and the analysis might be biased. Secondly, they set a short data period as two months. But the churn of customers before Feb might be triggered before November. There might be seasonal patterns or factors during that period.
#result is hided and provided in appendix
qwe <- read_excel("UV6696-XLS-ENG.xlsx",sheet=2)#import data
summary(qwe)#provide summary of data
a. Read in data and provide a summary, with a brief explanation for the terms
This dataset named qwe came from November and December 2011 of QWE’s SalesForce.com database. It is a sample of 6347 of QWE’s customers as of that date with several characteristics important about customer attrition, including recorded customer longevity in months (Customer Age in months), CHI score, number of support cases, average support priority (SP), and usage information: logins, blog articles, views, and days since last login. In total, there are 13 numeric variables in the original data.
Explanation of terms:
i. Month 0: data on the value of a characteristic of December.
ii. 0-1: the change of the value of a characteristic from November to December. Negative value means the number was decreased from November to December. Positive number means increasing.
iii. Churn: a customer called with a request to cancel his or her contract with QWE.depending on the type of contract, a customer was free to leave either at the end of each month or at the end of Month 6 or Month 12 of its relationship with QWE.
iv. Support Cases: how many requests of the customer routed to tech people to resolve system using issues.
v. Support Priority: a high value in priority denotes that the issue was serious, meaning the customer may indeed have some problems with services and may be more possible to drop the service.
vi. Usage – logins, blog articles, views, days since last login: a high value in logins, blog articles, views and a low value in days since last login shows the customer logined a lot, wrote more blogs, was viewed by more users and logined more frequently, meaning the customer likely sees value in QWE. services and therefore less possible to drop.
b. Mention the discrete and continuous variables in your data. Change the data-types of variables if necessary.
discrete variable:
ID, Churn (1 = Yes, 0 = No)
continuous variables:
Customer Age (in months), CHI Score Month 0, CHI Score 0-1, Support Cases Month 0, Support Cases 0-1, Logins 0-1, Blog Articles 0-1, Views 0-1, Days Since Last Login 0-1, SP Month 0, SP 0-1
qwe$ID=as.factor(qwe$ID)#change ID into factor
qwe$`Churn (1 = Yes, 0 = No)`=as.factor( qwe$`Churn (1 = Yes, 0 = No)` )#change Churn (1 = Yes, 0 = No) into factor
a. Distribution of CHI score for December 2011 by different churn outcomes.
#draw a boxplot
p1<-qwe %>%
ggplot(aes(x=`Churn (1 = Yes, 0 = No)`,y=`CHI Score Month 0`,fill=`Churn (1 = Yes, 0 = No)`))+
geom_boxplot(alpha=.5, width=.3, position="identity")+
labs(title = "Distribution of CHI score \nof churn outcomes",x="Churn outcomes(0=stay,1=drop)",y = "CHI score in December 2011")+#add title and axis labels
theme(legend.position="none")#hide the side signal of fill parameter
#draw a density plot
p2<-qwe %>%
ggplot(aes(x=`CHI Score Month 0`,fill=`Churn (1 = Yes, 0 = No)`))+
geom_density(alpha=.3,position="identity")+
labs(title = "Density of CHI score \nby churn outcomes\n(blue=churn, red= stay)",x="CHI Score",y = "Density of CHI score")+#add title and axis labels
theme(legend.position="none")#hide the side signal of fill parameter
grid.arrange(p1, p2, ncol = 2)#put 2 plots together
b. Average churn rate by customer age
avg_churn<-qwe %>%
group_by(`Customer Age (in months)`) %>%
summarise(avg_churn=mean(as.numeric(as.character(`Churn (1 = Yes, 0 = No)`)))) #count the average churn rate of different customer age group
p3<-avg_churn %>% ggplot(aes(x=`Customer Age (in months)`,y=avg_churn))+
geom_bar(stat = "identity",fill='blue',alpha=.5)+
labs(title = "Average churn rate\nby customer age",x="Cutomer age in months",y = "Average churn rate")#add title and axis labels
c. Number of customers who churn by customer age
p4<-qwe %>%
filter(`Churn (1 = Yes, 0 = No)`==1) %>% #keep only churn out customers' data
ggplot(aes(x=`Customer Age (in months)`))+
geom_bar(fill="indianred",alpha=.8)+#change the color and transparency
labs(title = "Number of churn customers \nby customer age",x="Cutomer age in months",y = "Number of churn customers")#add title and axis labels
grid.arrange(p3,p4, ncol = 2)#place 2 plots side by side
From the first 2 plots, customers that stay with QWE had averagely higher CHI score and larger maximum CHI score value than those who churn, with a median value around 90. While the churn group has distribution of CHI score at lower value area with a median around 55.
For customer age, there are roughly 2 peaks of average churn rate which are respectively around 12 and 48 months. As for total number of churn customers, customer age of 12 months has the largest.
Therefore, to actively avoid churn, we suggest QWE Inc. focus on the segment where customers’ CHI score is relatively low or lower than 50 (depends on the current data) and customer age is reaching 12 months.
churn_yes<-qwe %>% #filter the data accoring to churn
filter(`Churn (1 = Yes, 0 = No)`==1)
churn_no<-qwe %>%
filter(`Churn (1 = Yes, 0 = No)`==0)
#do t-test on 11 variables
t1=tidy(t.test(churn_yes$`Customer Age (in months)`,churn_no$`Customer Age (in months)`,paired = FALSE))
t2=tidy(t.test(churn_yes$`CHI Score Month 0`,churn_no$`CHI Score Month 0`,paired = FALSE))
t3=tidy(t.test(churn_yes$`CHI Score 0-1`,churn_no$`CHI Score 0-1`,paired = FALSE))
t4=tidy(t.test(churn_yes$`Support Cases Month 0`,churn_no$`Support Cases Month 0`,paired = FALSE))
t5=tidy(t.test(churn_yes$`Support Cases 0-1`,churn_no$`Support Cases 0-1`,paired = FALSE))
t6=tidy(t.test(churn_yes$`SP Month 0`,churn_no$`SP Month 0`,paired = FALSE))
t7=tidy(t.test(churn_yes$`SP 0-1`,churn_no$`SP 0-1`,paired = FALSE))
t8=tidy(t.test(churn_yes$`Logins 0-1`,churn_no$`Logins 0-1`,paired = FALSE))
t9=tidy(t.test(churn_yes$`Blog Articles 0-1`,churn_no$`Blog Articles 0-1`,paired = FALSE))
t10=tidy(t.test(churn_yes$`Views 0-1`,churn_no$`Views 0-1`,paired = FALSE))
t11=tidy(t.test(churn_yes$`Days Since Last Login 0-1`,churn_no$`Days Since Last Login 0-1`,paired = FALSE))
result=rbind(t1, t2,t3,t4,t5,t6,t7,t8,t9,t10,t11)#bind the rows of regressuon
rname=data.frame(rbind("Customer Age (in months)","CHI Score Month 0","CHI Score 0-1","Support Cases Month 0","Support Cases 0-1","SP Month 0","SP 0-1","Logins 0-1","Blog Articles 0-1","Views 0-1","Days Since Last Login 0-1"))# add the column of variable names
result=cbind(rname,result)#column bind variable name and values
result<-result %>% select(`rbind..Customer.Age..in.months.....CHI.Score.Month.0....CHI.Score.0.1...`,estimate1,estimate2,p.value)%>%rename( Mean_Churned = estimate1,Mean_Unchurned = estimate2,`Variable Name` =`rbind..Customer.Age..in.months.....CHI.Score.Month.0....CHI.Score.0.1...`)#select and rename the columns
pandoc.table(result,style='simple')#create a table
##
##
## Variable Name Mean_Churned Mean_Unchurned p.value
## --------------------------- -------------- ---------------- -----------
## Customer Age (in months) 15.35 13.82 0.003057
## CHI Score Month 0 63.27 88.61 2.097e-13
## CHI Score 0-1 -3.737 5.53 1.571e-08
## Support Cases Month 0 0.3715 0.7243 6.281e-08
## Support Cases 0-1 0.03715 -0.009296 0.5278
## SP Month 0 0.4996 0.8296 4.381e-07
## SP 0-1 -0.0167 0.03268 0.5218
## Logins 0-1 8.062 16.14 0.0004037
## Blog Articles 0-1 -0.1022 0.1711 0.01158
## Views 0-1 -95.77 106.6 0.05631
## Days Since Last Login 0-1 6.486 1.511 5.215e-05
a.The type of test
The type of the 11 tests is Independent Samples t-Test.
b.Null and alternate hypothesis for the test
Null Hypothesis: Population Mean for customers who churned = Population Mean for customers who didn’t.
Alternate Hypothesis: Population Mean for customers who churned != Population Mean for customers who didn’t.
c.Which variables differ significantly across the two groups? What is the managerial takeaway of these tests?
Under 5% significance level, Customer Age, CHI Score Month 0, CHI Score 0-1, Support Cases Month 0, SP Month 0, Logins 0-1,Blog Articles 0-1 and Days Since Last Login 0-1 are significantly cross the two groups. The managerial takeaway of these tests is that QWE should focus on customers age. But we can not make the conclusion that customers with longer customer age are more likely to churn. Looking at the plot below, we find that for churned customers there is a pinnacle around 15 while for unchurned customers there is a pinnacle around 5. Customers with lower CHI scores or recently droping scores are likely to churn. QWE should focus on improving its service and fixing the problems for customers with low CHI. When customers have increasing support cases, QWE should attach importance to that and promptly take action to slove the technic problems. According to SP Month 0, QWE should not neglect customers with lower support priority and should focus on inactive customers because SP Month 0, Logins 0-1,Blog Articles 0-1 and Days Since Last Login 0-1 certify the assupmtion that active customers are less likely to churn.
qwe %>% #plot density of Customer Age by churn outcomes
ggplot(aes(x=`Customer Age (in months)`,fill=`Churn (1 = Yes, 0 = No)`))+
geom_density(alpha=.3,position="identity")+
labs(title = "Density of Customer Age by churn outcomes\n(blue=churn, red=unchurn)",x="Customer Age in months",y = "Density of Customer Age")+#add title and axis labels
theme(legend.position="none")#hide the side signal of fill parameter
#run a logistic regression
#results are hided and provided in appendix
l1 <- glm(`Churn (1 = Yes, 0 = No)` ~.-ID, data = qwe, family="binomial")#set Churn as denpendent variable and all the other variables as independent variables except ID
summary(l1)
What are the significant factors affecting customer churn?
From the regression result, there are 6 factors significant at 90% confidence level, including Customer Age (in months), CHI Score Month 0, CHI Score 0-1, Support Cases 0-1, Views 0-1 and Days Since Last Login 0-1. If we increase the confidence level to 95%, Support Cases 0-1 is not significant and the remain 5 factors are still significant.
#calculate the change of relative odds of churn rate by the change of different factors
#results are hided and provided in appendix
exp(0.01271)-1#customer age
exp(-0.004657)-1#CHI Score Month 0
exp(-0.01027)-1#CHI Score 0-1
exp(-0.0001098)-1#Views 0-1
exp(0.01724)-1#Days Since Last Login 0-1
What should Aggarwal and Wall learn from this?
To understand the meaning of coefficients, we calculate the change of relative odds of churn rate by significant factors at 95% confidence level. When holding others constant,one unit increase in customer age makes odds increase 1.28%, in CHI Score Month 0 makes odds decrease 0.46%, in CHI Score 0-1 makes odds decrease 1.02%, in Views 0-1 makes odd decrease 0.01% and in Days Since Last Login 0-1 makes odd increase 1.74%.
Thus, to decrease churn, Aggarwal and Wall can target on the group that is more possible to churn. The segment(s) has the characteristics as listed:
i. longer customer age
ii. lower CHI Score at current time or/and CHI Score has been decreasing iii. requests of support have been increasing
iv. Views of blog has been decreasing
v. Days Since Last Login 0-1 has been increasing
a. Create segments based on customer age – “New” (0-6 months), “Medium” (7-12 months), “Old” (13+ months).
new<-qwe %>%
filter(`Customer Age (in months)`>=0 &`Customer Age (in months)`<=6)
medium<-qwe %>%
filter(`Customer Age (in months)`>6 &`Customer Age (in months)`<=12)
old<-qwe %>%
filter(`Customer Age (in months)`>12)
b. Perform logistic regressions as part (5) above for each segment separately
#run logistic regression for new customers segment
#results are hided and provided in appendix
l_new <- glm(`Churn (1 = Yes, 0 = No)` ~ .- ID, data=new, family=binomial)
#run logistic regression for medium customers segment
l_medium <- glm(`Churn (1 = Yes, 0 = No)` ~ .- ID, data=medium, family=binomial)
#run logistic regression for old customers segment
l_old <- glm(`Churn (1 = Yes, 0 = No)` ~ .- ID, data=old, family=binomial)
summary(l_new);summary(l_medium);summary(l_old)
Interpret the regression results for each segment.
#calculate the change of relative odds of churn rate by the change of different factors at 95% confidence level
#results are hided and provided in appendix
#new segment
exp(0.3883)-1#customer age
exp(-0.0191)-1#CHI Score 0-1
exp(0.007239)-1#Login 0-1
For new customers segment, Customer Age (in months), CHI Score 0-1 and Logins 0-1 are 3 significant factors to affect churn.
When holding others constant,one unit increase in customer age makes odds increase 47.45%, in CHI Score 0-1 makes odds decrease 1.89%, in Logins 0-1 makes odd increase 0.73%.
Thus, to decrease churn in new customers, Aggarwal and Wall can target on the group that is more possible to churn, which has the characteristics as listed:
i. longer customer age (very important)
ii. CHI Score has been decreasing between months
iii. Login times has been increasing between months
#calculate the change of relative odds of churn rate by the change of different factors
#results are hided and provided in appendix
#medium segment
exp(0.3187)-1#customer age
exp(-0.009842)-1#CHI Score Month 0
exp(-0.0001332)-1#Views 0-1
exp(0.01468)-1#Days Since Last Login 0-1
For medium customers segment, Customer Age (in months), CHI Score 0-1, Logins 0-1 and Views 0-1 are 4 significant factors to affect churn.
When holding others constant,one unit increase in customer age makes odds increase 37.53%, in CHI Score 0-1 makes odds decrease 0.98%, in Logins 0-1 makes odds increase 1.47% and in Views 0-1 makes odds decrease 0.013%
Thus, to decrease churn in medium customers, possible group to churn has the characteristics as listed:
i. longer customer age (very important)
ii. CHI Score has been decreasing between months
iii. Login times has been increasing between months
iv. Views has been decreasing between months
#calculate the change of relative odds of churn rate by the change of different factors
#results are hided and provided in appendix
#old segment
exp(-0.03984)-1#customer age
exp(-0.01146)-1#CHI Score Month 0
For old customers segment, Customer Age (in months) and CHI Score Month 0 are 2 significant factors to affect churn.
When holding others constant,one unit increase in customer age makes odds decrease 3.91%, in CHI Score Month 0 makes odds decrease 1.14%.
Thus, to decrease churn in medium customers, possible group to churn has the characteristics as listed:
i. shorter customer age
ii. CHI Score at current time is lower
Which factors differ across segments?
CHI Score 0-1,CHI Score Month 0,CHI Score Month 0,Logins 0-1,Views 0-1,and Days Since Last Login 0-1 differ across segments. CHI Score 0-1 is significant in segment new but CHI Score Month 0 is significant in segment medium and segment old. CHI Score Month 0 is a bigger negative effect in log(odds) in segment medium than in segment old. Logins 0-1 is only significant in segment new. Days Since Last Login 0-1 and Views 0-1 are only signigicant in segment medium.
Which variables consistently affect all segments?Do their magnitudes vary significantly across segments?
The variable that consistently affects all segments is Customer Age in months. Yes, their magnitudes vary significantly across segments. In segment new, the coefficient of customer age is 3.883e-01; in segment medium, the coefficient of customer is 3.187e-01; in segment old, the coefficient of customer is -3.984e-02. Only in segment old, the customer age has a negative effect on the log(odds) which is consistent with the assumption that customers with more than 12 months subscription are more loyal. In other two segments, the customer age has a positive effect on the log(odds). The coefficient of customer age in segment new is bigger than than in segment medium and for new segment, one month increase in Customer Age results in 47.45%-37.53%=9.92% more increase in relative odds than the medium segment, which is a significant difference.
From the analysis above, we discovered that customer age is a extremely important factor related to churn rate. Firstly, we found the significant difference in customer age among churned group versus unchurned. Secondly, we carefully took a look at the distribution of customer age before jumping to the conclusion that unchurned groups has a lower customer age. It is true that found that the longer customer age is, the fewer customers are. According to the chart in (4), we found that the pinnacle for customer age appears earlier in unchurned segment around 5 than in churned segment aroung 15. The QWE should focus on the key period around 12-15. Thirdly, we segmented the data though different customer age groups and run the logistic regression. We realized that it has different direction for the old customers from those new and medium customers. If we didn’t do apriori segmentation before exploring the factors, we would come to the partially wrong result that customers with longer age are more possible to churn, which is not the truth for old customers. Therefore, doing an apriori segmentation before running regression is critical. To do a correct segmentation, we can firstly draw a plot of churn or other interested binary variable on corresponding factor. As we did in the analysis, the plot named Average churn rate by customer age could provide a rough direction for how to seperate customer age into proper segments.
qwe <- read_excel("UV6696-XLS-ENG.xlsx",sheet=2)#import data
summary(qwe)#provide summary of data
## ID Customer Age (in months) Churn (1 = Yes, 0 = No)
## Min. : 1 Min. : 0.0 Min. :0.00000
## 1st Qu.:1588 1st Qu.: 5.0 1st Qu.:0.00000
## Median :3174 Median :11.0 Median :0.00000
## Mean :3174 Mean :13.9 Mean :0.05089
## 3rd Qu.:4760 3rd Qu.:20.0 3rd Qu.:0.00000
## Max. :6347 Max. :67.0 Max. :1.00000
## CHI Score Month 0 CHI Score 0-1 Support Cases Month 0
## Min. : 0.00 Min. :-125.000 Min. : 0.0000
## 1st Qu.: 24.50 1st Qu.: -8.000 1st Qu.: 0.0000
## Median : 87.00 Median : 0.000 Median : 0.0000
## Mean : 87.32 Mean : 5.059 Mean : 0.7063
## 3rd Qu.:139.00 3rd Qu.: 15.000 3rd Qu.: 1.0000
## Max. :298.00 Max. : 208.000 Max. :32.0000
## Support Cases 0-1 SP Month 0 SP 0-1
## Min. :-29.000000 Min. :0.0000 Min. :-4.00000
## 1st Qu.: 0.000000 1st Qu.:0.0000 1st Qu.: 0.00000
## Median : 0.000000 Median :0.0000 Median : 0.00000
## Mean : -0.006932 Mean :0.8128 Mean : 0.03017
## 3rd Qu.: 0.000000 3rd Qu.:2.6667 3rd Qu.: 0.00000
## Max. : 31.000000 Max. :4.0000 Max. : 4.00000
## Logins 0-1 Blog Articles 0-1 Views 0-1
## Min. :-293.00 Min. :-75.0000 Min. :-28322.00
## 1st Qu.: -1.00 1st Qu.: 0.0000 1st Qu.: -11.00
## Median : 2.00 Median : 0.0000 Median : 0.00
## Mean : 15.73 Mean : 0.1572 Mean : 96.31
## 3rd Qu.: 23.00 3rd Qu.: 0.0000 3rd Qu.: 27.00
## Max. : 865.00 Max. :217.0000 Max. :230414.00
## Days Since Last Login 0-1
## Min. :-648.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 1.765
## 3rd Qu.: 3.000
## Max. : 61.000
#run a logistic regression
l1 <- glm(`Churn (1 = Yes, 0 = No)` ~.-ID, data = qwe, family="binomial")#set Churn as denpendent variable and all the other variables as independent variables except ID
summary(l1)
##
## Call:
## glm(formula = `Churn (1 = Yes, 0 = No)` ~ . - ID, family = "binomial",
## data = qwe)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0047 -0.3542 -0.2957 -0.2328 3.0660
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.763e+00 1.069e-01 -25.841 < 2e-16 ***
## `Customer Age (in months)` 1.271e-02 5.370e-03 2.366 0.01799 *
## `CHI Score Month 0` -4.657e-03 1.223e-03 -3.808 0.00014 ***
## `CHI Score 0-1` -1.027e-02 2.474e-03 -4.153 3.29e-05 ***
## `Support Cases Month 0` -1.524e-01 1.049e-01 -1.452 0.14643
## `Support Cases 0-1` 1.703e-01 9.050e-02 1.881 0.05992 .
## `SP Month 0` 1.593e-02 1.022e-01 0.156 0.87611
## `SP 0-1` -5.194e-02 7.852e-02 -0.661 0.50830
## `Logins 0-1` 2.893e-04 2.092e-03 0.138 0.89002
## `Blog Articles 0-1` 2.905e-04 1.960e-02 0.015 0.98817
## `Views 0-1` -1.098e-04 4.071e-05 -2.697 0.00700 **
## `Days Since Last Login 0-1` 1.724e-02 4.289e-03 4.020 5.81e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2553.1 on 6346 degrees of freedom
## Residual deviance: 2440.3 on 6335 degrees of freedom
## AIC: 2464.3
##
## Number of Fisher Scoring iterations: 7
#calculate the change of relative odds of churn rate by the change of different factors
exp(0.01271)-1#customer age
## [1] 0.01279112
exp(-0.004657)-1#CHI Score Month 0
## [1] -0.004646173
exp(-0.01027)-1#CHI Score 0-1
## [1] -0.01021744
exp(-0.0001098)-1#Views 0-1
## [1] -0.000109794
exp(0.01724)-1#Days Since Last Login 0-1
## [1] 0.01738947
b. Perform logistic regressions as part (5) above for each segment separately
#run logistic regression for new customers segment
l_new <- glm(`Churn (1 = Yes, 0 = No)` ~ .- ID, data=new, family=binomial)
#run logistic regression for medium customers segment
l_medium <- glm(`Churn (1 = Yes, 0 = No)` ~ .- ID, data=medium, family=binomial)
#run logistic regression for old customers segment
l_old <- glm(`Churn (1 = Yes, 0 = No)` ~ .- ID, data=old, family=binomial)
summary(l_new);summary(l_medium);summary(l_old)
##
## Call:
## glm(formula = `Churn (1 = Yes, 0 = No)` ~ . - ID, family = binomial,
## data = new)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6704 -0.2157 -0.1405 -0.1158 3.4027
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.390e+00 5.046e-01 -10.682 < 2e-16 ***
## `Customer Age (in months)` 3.883e-01 1.330e-01 2.919 0.00351 **
## `CHI Score Month 0` -5.195e-03 4.579e-03 -1.135 0.25655
## `CHI Score 0-1` -1.910e-02 6.280e-03 -3.041 0.00236 **
## `Support Cases Month 0` -1.869e-01 1.567e-01 -1.193 0.23303
## `Support Cases 0-1` 2.467e-01 1.455e-01 1.695 0.09001 .
## `SP Month 0` 3.607e-01 2.013e-01 1.791 0.07321 .
## `SP 0-1` -2.254e-01 1.571e-01 -1.435 0.15129
## `Logins 0-1` 7.239e-03 3.061e-03 2.365 0.01802 *
## `Blog Articles 0-1` -1.649e-03 2.789e-02 -0.059 0.95285
## `Views 0-1` 8.911e-05 1.522e-04 0.586 0.55813
## `Days Since Last Login 0-1` 3.552e-02 2.013e-02 1.765 0.07765 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 425.14 on 2050 degrees of freedom
## Residual deviance: 380.69 on 2039 degrees of freedom
## AIC: 404.69
##
## Number of Fisher Scoring iterations: 7
##
## Call:
## glm(formula = `Churn (1 = Yes, 0 = No)` ~ . - ID, family = binomial,
## data = medium)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1506 -0.4174 -0.2815 -0.1777 3.0893
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.819e+00 6.897e-01 -6.987 2.80e-12 ***
## `Customer Age (in months)` 3.187e-01 6.495e-02 4.907 9.25e-07 ***
## `CHI Score Month 0` -9.842e-03 2.301e-03 -4.277 1.89e-05 ***
## `CHI Score 0-1` -3.383e-03 4.669e-03 -0.724 0.4688
## `Support Cases Month 0` -1.772e-01 2.521e-01 -0.703 0.4823
## `Support Cases 0-1` 1.339e-01 1.824e-01 0.734 0.4628
## `SP Month 0` -1.828e-01 2.095e-01 -0.872 0.3830
## `SP 0-1` 2.001e-02 1.458e-01 0.137 0.8909
## `Logins 0-1` -9.685e-05 4.092e-03 -0.024 0.9811
## `Blog Articles 0-1` -4.751e-02 6.162e-02 -0.771 0.4407
## `Views 0-1` -1.332e-04 5.743e-05 -2.319 0.0204 *
## `Days Since Last Login 0-1` 1.468e-02 6.552e-03 2.240 0.0251 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 793.58 on 1512 degrees of freedom
## Residual deviance: 684.75 on 1501 degrees of freedom
## AIC: 708.75
##
## Number of Fisher Scoring iterations: 7
##
## Call:
## glm(formula = `Churn (1 = Yes, 0 = No)` ~ . - ID, family = binomial,
## data = old)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7415 -0.3938 -0.2910 -0.2070 3.1056
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.271e-01 2.650e-01 -2.744 0.00607 **
## `Customer Age (in months)` -3.984e-02 1.000e-02 -3.982 6.82e-05 ***
## `CHI Score Month 0` -1.146e-02 1.704e-03 -6.728 1.72e-11 ***
## `CHI Score 0-1` 2.699e-05 3.583e-03 0.008 0.99399
## `Support Cases Month 0` -9.970e-02 1.925e-01 -0.518 0.60460
## `Support Cases 0-1` 8.440e-02 1.531e-01 0.551 0.58137
## `SP Month 0` -3.853e-02 1.648e-01 -0.234 0.81520
## `SP 0-1` 4.755e-02 1.237e-01 0.384 0.70067
## `Logins 0-1` -3.835e-03 4.227e-03 -0.907 0.36423
## `Blog Articles 0-1` 1.376e-02 2.898e-02 0.475 0.63492
## `Views 0-1` -1.154e-04 7.307e-05 -1.580 0.11421
## `Days Since Last Login 0-1` 5.331e-04 3.284e-03 0.162 0.87105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1268.9 on 2782 degrees of freedom
## Residual deviance: 1167.7 on 2771 degrees of freedom
## AIC: 1191.7
##
## Number of Fisher Scoring iterations: 6
Interpret the regression results for each segment.
new segment
#calculate the change of relative odds of churn rate by the change of different factors at 95% confidence level
#new segment
exp(0.3883)-1#customer age
## [1] 0.4744721
exp(-0.0191)-1#CHI Score 0-1
## [1] -0.01891875
exp(0.007239)-1#Login 0-1
## [1] 0.007265265
medium segment
#calculate the change of relative odds of churn rate by the change of different factors
#medium segment
exp(0.3187)-1#customer age
## [1] 0.3753387
exp(-0.009842)-1#CHI Score Month 0
## [1] -0.009793726
exp(-0.0001332)-1#Views 0-1
## [1] -0.0001331911
exp(0.01468)-1#Days Since Last Login 0-1
## [1] 0.01478828
old segment
#calculate the change of relative odds of churn rate by the change of different factors
#old segment
exp(-0.03984)-1#customer age
## [1] -0.03905682
exp(-0.01146)-1#CHI Score Month 0
## [1] -0.01139458