Project Title: Using Customer Behavior Data to Improve Customer Retention

NAME: ASWATHY GUNADEEP

EMAIL: aswathygunadeep@gmail.com

COLLEGE / COMPANY: NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

A telecommunications company is concerned about the number of customers leaving their landline business for cable competitors. They need to understand who is leaving.

The data set includes information about:

PRELIMINARY WORK

setwd("C:/Users/user/Desktop/tarsha systems summer internship/basic script")
ash.df <- read.csv(paste("WA_Fn-UseC_-Telco-Customer-Churn.csv", sep=""))
attach(ash.df)
dim(ash.df)
## [1] 7043   21
str(ash.df)
## 'data.frame':    7043 obs. of  21 variables:
##  $ customerID      : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
##  $ gender          : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
##  $ SeniorCitizen   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Partner         : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
##  $ Dependents      : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
##  $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
##  $ PhoneService    : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
##  $ MultipleLines   : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
##  $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
##  $ OnlineSecurity  : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
##  $ OnlineBackup    : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
##  $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
##  $ TechSupport     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
##  $ StreamingTV     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
##  $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
##  $ Contract        : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
##  $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
##  $ PaymentMethod   : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
##  $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
##  $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
##  $ Churn           : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...
library(car)
## Loading required package: carData
some(ash.df)
##      customerID gender SeniorCitizen Partner Dependents tenure
## 604  2001-MCUUW   Male             0      No         No     46
## 1372 6567-HOOPW Female             1      No         No      1
## 1925 1845-CSBRZ Female             1     Yes        Yes     22
## 2044 6557-BZXLQ   Male             1      No         No     16
## 2340 4094-NSEDU Female             1      No         No     21
## 2412 6901-GOGZG   Male             0      No        Yes     60
## 3219 6869-FGJJC   Male             0      No         No     68
## 5951 2862-PFNIK   Male             0      No        Yes      1
## 6016 2325-NBPZG Female             0      No         No      3
## 6255 9933-QRGTX Female             0     Yes         No     60
##      PhoneService    MultipleLines InternetService OnlineSecurity
## 604           Yes              Yes     Fiber optic             No
## 1372          Yes               No     Fiber optic             No
## 1925          Yes              Yes     Fiber optic             No
## 2044          Yes               No     Fiber optic             No
## 2340          Yes              Yes     Fiber optic             No
## 2412          Yes              Yes             DSL            Yes
## 3219          Yes               No     Fiber optic             No
## 5951           No No phone service             DSL             No
## 6016          Yes               No     Fiber optic            Yes
## 6255          Yes               No     Fiber optic            Yes
##      OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies
## 604           Yes              Yes         Yes         Yes             Yes
## 1372           No               No          No         Yes              No
## 1925           No               No          No         Yes              No
## 2044           No               No          No          No              No
## 2340           No               No          No         Yes             Yes
## 2412           No              Yes         Yes         Yes             Yes
## 3219          Yes              Yes          No          No              No
## 5951           No               No          No          No              No
## 6016           No               No          No          No              No
## 6255           No               No         Yes         Yes             Yes
##            Contract PaperlessBilling             PaymentMethod
## 604        Two year               No          Electronic check
## 1372 Month-to-month              Yes          Electronic check
## 1925 Month-to-month               No Bank transfer (automatic)
## 2044 Month-to-month               No          Electronic check
## 2340 Month-to-month              Yes          Electronic check
## 2412       Two year              Yes          Electronic check
## 3219       One year               No   Credit card (automatic)
## 5951 Month-to-month              Yes          Electronic check
## 6016 Month-to-month              Yes          Electronic check
## 6255       Two year              Yes          Electronic check
##      MonthlyCharges TotalCharges Churn
## 604          108.65      4903.20    No
## 1372          79.20        79.20   Yes
## 1925          85.35      1961.60    No
## 2044          69.65      1043.30    No
## 2340          94.25      1973.75   Yes
## 2412          84.95      4984.85    No
## 3219          79.60      5461.45    No
## 5951          24.70        24.70    No
## 6016          74.55       233.65    No
## 6255          97.20      5611.75    No

VISUALIZATION

pie(table(ash.df$Churn),main="churn", col=c("skyblue","yellow"))

mytable <- with(ash.df, table(Churn))
prop.table(mytable)*100
## Churn
##       No      Yes 
## 73.46301 26.53699

Which customers are likely to leave?

The obvious guess is Total charges and Monthly charges and tenure. To make sure of this, we plot the corrgram, create correlation matrix and scatterplot matrix , excluding the the less important factor variables , having 2 levels-“Yes” and “No”.

CORRGRAM

library(corrgram)
par(mfrow=c(1,1))
corrgram(ash.df, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of dataset")

These are some of the possible conclusions that can be deduced from the above graph:

SCATTERPLOTMATRIX

library(car)
scatterplotMatrix(~TotalCharges+MonthlyCharges+tenure | Churn, data=ash.df,main="scatterplotmatrix", col=c("red","black"))

CORRELATION MATRIX

den.df <- subset(ash.df[,c(6,19,20)], ash.df$TotalCharges>=0 && ash.df$MonthlyCharges>=0)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
data <- cor(den.df,use = "complete.obs")
round(data,2)
##                tenure MonthlyCharges TotalCharges
## tenure           1.00           0.25         0.83
## MonthlyCharges   0.25           1.00         0.65
## TotalCharges     0.83           0.65         1.00

The correlation matrix also suggests that tenure affects total charges greatly.

I 1.Demographic variables such as gender and no. of senior citizens:

par(mfrow=c(1,2))
mytable1 <- with(ash.df, table(gender))
barchart(mytable1, col="grey", main="genderwise breakup")

prop.table(mytable1)*100
## gender
##   Female     Male 
## 49.52435 50.47565
mytable <- with(ash.df, table(SeniorCitizen))
barchart(mytable,main="no. of seniorcitizens",col="grey")

prop.table(mytable)*100
## SeniorCitizen
##        0        1 
## 83.78532 16.21468

There are more males than females.

  1. Tenure:
par(mfrow=c(1,1))
d <- density(ash.df$tenure)
plot(d, main="tenure", col="blue")
polygon(d, col="blue", border="turquoise")

Many customers use the company’s service for a shorter period.

  1. Distribution of phone service and multiple lines connection:
par(mfrow=c(1,2))
mytable1 <- with(ash.df, table(PhoneService))
histogram(ash.df$PhoneService, main="phone service", col=c("red", "orange"), xlab="phone service")

prop.table(mytable1)*100
## PhoneService
##        No       Yes 
##  9.683374 90.316626
mytable <- with(ash.df, table(MultipleLines))
histogram(ash.df$MultipleLines, main="multiple lines",col=c("red", "orange", "yellow"),xlab="multiple lines service")

prop.table(mytable)*100
## MultipleLines
##               No No phone service              Yes 
##        48.132898         9.683374        42.183729

Most customers use the company’s phone service and do not have multiple lines.

  1. Distribution of type of internet service:
pie(table(ash.df$InternetService),main="types of internet services", col=c("magenta","violet","pink"))

mytable <- with(ash.df, table(InternetService))
prop.table(mytable)*100
## InternetService
##         DSL Fiber optic          No 
##    34.37456    43.95854    21.66690

Almost 44% customers have fiber optic type of internet service.

  1. Online security and online backup:
par(mfrow=c(1,2))
mytable <- with(ash.df, table(OnlineSecurity))
prop.table(mytable)*100
## OnlineSecurity
##                  No No internet service                 Yes 
##            49.66634            21.66690            28.66676
barchart(mytable, main="online security split up", horizontal=FALSE, xlab="online security", col=c("burlywood", "darkolivegreen"))

mytable1 <- with(ash.df, table(OnlineBackup))
prop.table(mytable1)*100
## OnlineBackup
##                  No No internet service                 Yes 
##            43.84495            21.66690            34.48814
barchart(mytable1, main="online backup split up", horizontal=FALSE, xlab="online backup", col=c("burlywood","darkolivegreen"))

  1. Contract type:
par(mfrow=c(1,1))
mytable <- with(ash.df, table(Contract))
prop.table(mytable)*100
## Contract
## Month-to-month       One year       Two year 
##       55.01917       20.91438       24.06645
barchart(mytable, main="contract", horizontal=FALSE, col=c("skyblue","orange","green"), xlab="type of contract")

55% customers have availed month-to-month contract basis.

  1. Monthly charges and total charges
par(mfrow=c(1,2))
den.df <- subset(ash.df, ash.df$TotalCharges>=0)
d <- density(den.df$TotalCharges)
plot(d, main="Total charges")
polygon(d, col="red", border="blue")
den1.df <- subset(ash.df, ash.df$MonthlyCharges>=0)
d1 <- density(den1.df$MonthlyCharges)
plot(d1, main="Monthly charges")
polygon(d1, col="red", border="blue")

Monthly charges are more than total charges, which could be reason for customers to leave.

The key drivers are tenure, contract, internet service, total charges and monthly charges.

II 1. tenure drives churn?

par(mfrow=c(1,2))
histogram(~ tenure | Churn,col="green", main="tenure with churn")

This shows that customers who tend to leave are people who have been customers for shorter periods.

  1. Role of total charges and monthly charges in retention.
xyplot(TotalCharges ~ MonthlyCharges, data=den.df, main="total charges v/s monthly charges", type = c("p", "g"),xlab="monthly charges", ylab="total charges")

par(mfrow=c(2,1))
library(lattice)
histogram(~ TotalCharges | Churn, data=ash.df)

histogram(~ MonthlyCharges | Churn, data=ash.df, col="orange")

As stated above, due to high monthly charges, churn percentage is more in that case.

  1. Role of contract type
library(vcd)
## Loading required package: grid
par(mfrow=c(1,1))
tab1 <- xtabs(~ Churn + Contract)
tab1
##      Contract
## Churn Month-to-month One year Two year
##   No            2220     1307     1647
##   Yes           1655      166       48
mosaic(tab1, shade=TRUE, legend=TRUE, main="contract basis")

Customers who tend to leave are on a month-to-month contract basis.

  1. role of type of internet service
par(mfrow=c(1,1))
tab2 <- xtabs(~ Churn + InternetService)
tab2
##      InternetService
## Churn  DSL Fiber optic   No
##   No  1962        1799 1413
##   Yes  459        1297  113
mosaic(tab2, shade=TRUE, legend=TRUE, main="internet service effects")

Customers who tend to leave are havinf fiber optic type of internet connection.

  1. total charges v/s internet services
boxplot(TotalCharges ~ InternetService, data=den.df,main="total charges with type of internet service", col=c("blue","red","yellow"),xlab="type of internet service",ylab="total charges")

Total charges are more for fiber optic type of internet service.

  1. tenure v/s contract
boxplot(tenure ~ Contract, data=ash.df,main="tenure v/s contract", col=c("green","orange","pink"),xlab="type of contract",ylab="tenure")

  1. tenure v/s online security
boxplot(tenure ~ OnlineSecurity, data=ash.df,main="tenure v/s online security", col=c("darkcyan","yellowgreen","darkseagreen1"),xlab="online security",ylab="tenure")

Evidently, customers prefer to use the company’s services for a longer time if they are assured online security.

  1. total charges v/s tenure
plot(x=ash.df$tenure, y=ash.df$TotalCharges, main="total charges v/s tenure", col="red", cex=0.6, xlab="tenure", ylab="total charges")

From the above analysis and visualizations, customers who leave tend to be ones who are on a month-to-month contract, have fiber optic internet service, and have been customers for shorter periods.

III

library(ggplot2)
par(mfrow=c(1,2))
qplot(x = TotalCharges, y = tenure, data = ash.df, color = factor(Churn))
## Warning: Removed 11 rows containing missing values (geom_point).

qplot(x = TotalCharges, y = MonthlyCharges, data = ash.df, color = factor(Churn))
## Warning: Removed 11 rows containing missing values (geom_point).

CHI-SQUARED TESTS:

chisq.test(Churn,TotalCharges)
## Warning in chisq.test(Churn, TotalCharges): Chi-squared approximation may
## be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and TotalCharges
## X-squared = 6503.6, df = 6529, p-value = 0.5859
chisq.test(Churn,MonthlyCharges)
## Warning in chisq.test(Churn, MonthlyCharges): Chi-squared approximation may
## be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and MonthlyCharges
## X-squared = 2123.6, df = 1584, p-value < 2.2e-16
chisq.test(Churn,tenure)
## Warning in chisq.test(Churn, tenure): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and tenure
## X-squared = 1065.3, df = 72, p-value < 2.2e-16
chisq.test(Churn,Contract)
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and Contract
## X-squared = 1184.6, df = 2, p-value < 2.2e-16

Hence, since all the p-values are very less (except for totalcharges), our original hypothesis is correct and NULL hypothesis of independence is rejected.

t-Tests NULL hypothesis: There is no difference in total charges for people abandoning and still using the company’s services.

t.test(TotalCharges ~ Churn, data=ash.df)
## 
##  Welch Two Sample t-test
## 
## data:  TotalCharges by Churn
## t = 18.801, df = 4042.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   916.8121 1130.2840
## sample estimates:
##  mean in group No mean in group Yes 
##          2555.344          1531.796

Based on the above output of the t-test, we can reject the hypothesis that the total charges of people leaving and still using the company’s services are the same. (p<0.001)

Regression models:

Simple linear regression (1)

sfit1 <- lm(TotalCharges ~ tenure, data=ash.df)
summary(sfit1)
## 
## Call:
## lm(formula = TotalCharges ~ tenure, data = ash.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3997.2  -416.1   157.1   526.1  3382.8 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -189.529     25.254  -7.505 6.91e-14 ***
## tenure        76.271      0.621 122.810  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1278 on 7030 degrees of freedom
##   (11 observations deleted due to missingness)
## Multiple R-squared:  0.6821, Adjusted R-squared:  0.682 
## F-statistic: 1.508e+04 on 1 and 7030 DF,  p-value: < 2.2e-16
plot(y=TotalCharges, x=tenure, main="total charges v/s tenure", col="orange")
abline(sfit1)

The regression coefficient (76.27) is significantly dfferent from zero (p < 0.001). The multiple R-squared indciates that the model accounts for 68% of the variance in tenure. Although, the residual standard error is quite high for this model to predict total charges. This is not a good model.

Simple linear regression (2)

sfit2 <- lm(MonthlyCharges ~ tenure, data=ash.df)
summary(sfit2)
## 
## Call:
## lm(formula = MonthlyCharges ~ tenure, data = ash.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.498 -27.251   6.245  24.943  54.376 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 54.92978    0.57476   95.57   <2e-16 ***
## tenure       0.30372    0.01415   21.47   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.15 on 7041 degrees of freedom
## Multiple R-squared:  0.06145,    Adjusted R-squared:  0.06132 
## F-statistic:   461 on 1 and 7041 DF,  p-value: < 2.2e-16
plot(y=MonthlyCharges, x=tenure, main="monthly charges v/s tenure", col="red")
abline(sfit2)

The regression coefficient (0.303) is dfferent from zero (p < 0.001). The multiple R-squared indciates that the model accounts for 6% of the variance in tenure. Although, the residual standard error is quite high for this model to predict monthly charges. This is not at all a good model.

Logistic regression

model <- glm(Churn ~ TotalCharges + OnlineSecurity + OnlineBackup + tenure + MonthlyCharges + gender + Contract + PhoneService + MultipleLines + InternetService, family=binomial(link='logit'),data=ash.df)
summary(model)
## 
## Call:
## glm(formula = Churn ~ TotalCharges + OnlineSecurity + OnlineBackup + 
##     tenure + MonthlyCharges + gender + Contract + PhoneService + 
##     MultipleLines + InternetService, family = binomial(link = "logit"), 
##     data = ash.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7072  -0.7073  -0.2926   0.8059   3.4724  
## 
## Coefficients: (3 not defined because of singularities)
##                                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)                        1.733e-01  2.027e-01   0.855  0.39250
## TotalCharges                       3.057e-04  7.022e-05   4.353 1.34e-05
## OnlineSecurityNo internet service -5.870e-01  1.873e-01  -3.133  0.00173
## OnlineSecurityYes                 -5.550e-01  8.490e-02  -6.537 6.29e-11
## OnlineBackupNo internet service           NA         NA      NA       NA
## OnlineBackupYes                   -2.356e-01  7.834e-02  -3.007  0.00264
## tenure                            -5.915e-02  6.135e-03  -9.641  < 2e-16
## MonthlyCharges                     1.075e-02  4.263e-03   2.521  0.01171
## genderMale                        -3.006e-02  6.397e-02  -0.470  0.63839
## ContractOne year                  -8.254e-01  1.056e-01  -7.817 5.39e-15
## ContractTwo year                  -1.682e+00  1.729e-01  -9.726  < 2e-16
## PhoneServiceYes                   -9.549e-01  1.463e-01  -6.527 6.71e-11
## MultipleLinesNo phone service             NA         NA      NA       NA
## MultipleLinesYes                   2.627e-01  8.099e-02   3.244  0.00118
## InternetServiceFiber optic         7.417e-01  1.390e-01   5.338 9.41e-08
## InternetServiceNo                         NA         NA      NA       NA
##                                      
## (Intercept)                          
## TotalCharges                      ***
## OnlineSecurityNo internet service ** 
## OnlineSecurityYes                 ***
## OnlineBackupNo internet service      
## OnlineBackupYes                   ** 
## tenure                            ***
## MonthlyCharges                    *  
## genderMale                           
## ContractOne year                  ***
## ContractTwo year                  ***
## PhoneServiceYes                   ***
## MultipleLinesNo phone service        
## MultipleLinesYes                  ** 
## InternetServiceFiber optic        ***
## InternetServiceNo                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8143.4  on 7031  degrees of freedom
## Residual deviance: 5938.0  on 7019  degrees of freedom
##   (11 observations deleted due to missingness)
## AIC: 5964
## 
## Number of Fisher Scoring iterations: 6

It is evident that MonthlyCharges and gender is not signiificant at all. As for the statistically significant variables, tenure has the lowest p-value suggesting a strong association of the period for which the customers used the company’s services with the churn prediction.

As a last step, we are going to plot the ROC curve and calculate the AUC (area under the curve) which are typical performance measurements for a binary classifier. The ROC(receiving operating characteristic) is a curve generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings while the AUC is the area under the ROC curve. As a rule of thumb, a model with good predictive ability should have an AUC closer to 1 (1 is ideal) than to 0.5.

library(ROCR)
## Loading required package: gplots
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
library(gplots)
fitted.results <- predict(model,newdata=subset(ash.df,select=c(10,20,6,16,7,9)),type='response')
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading
pr <- prediction(fitted.results, ash.df$Churn)
prf <- performance(pr, measure = "tpr", x.measure = "fpr")
plot(prf)

auc <- performance(pr, measure = "auc")
auc <- auc@y.values[[1]]
auc
## [1] 0.8418911

Forecasting future values:

library(forecast)
ash1.df <- subset(ash.df[c(6,19,20)])
myts <- ts(ash1.df)
forecast(myts, 3)
## Warning in ets(object, lambda = lambda, allow.multiplicative.trend =
## allow.multiplicative.trend, : Missing values encountered. Using longest
## contiguous portion of time series
## tenure
##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 7044       32.38652 0.9130221 63.86002 -15.74804 80.52109
## 7045       32.38652 0.9130219 63.86002 -15.74804 80.52109
## 7046       32.38652 0.9130218 63.86002 -15.74804 80.52109
## 
## MonthlyCharges
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 7044        64.7517 26.24427 103.2591 5.859675 123.6437
## 7045        64.7517 26.24427 103.2591 5.859675 123.6437
## 7046        64.7517 26.24427 103.2591 5.859674 123.6437
## 
## TotalCharges
##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 3332        2254.99 -651.9964 5161.977 -2190.862 6700.843
## 3333        2254.99 -651.9964 5161.977 -2190.862 6700.843
## 3334        2254.99 -651.9965 5161.977 -2190.862 6700.843
plot(forecast(myts, 3))
## Warning in ets(object, lambda = lambda, allow.multiplicative.trend =
## allow.multiplicative.trend, : Missing values encountered. Using longest
## contiguous portion of time series

From the above analysis and visualizations, customers who leave tend to be ones who are on a month-to-month contract, have fiber optic internet service, and have been customers for shorter periods.