Project Title: Implementation of CRISP-DM Model on churn prediction dataset

NAME: ASWATHY GUNADEEP

EMAIL: aswathygunadeep@gmail.com

COLLEGE / COMPANY: NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA

According to CRISP-DM, a given datamining project has a life cycle consisting of six phases. The phase-sequence is adaptive.That is, the next phase in the sequence often depends on the outcomes associated with the previous phase.

CRISP-DM: The Six Phases:

  1. Business/Research Understanding Phase

  2. Data Understanding Phase

  3. Data Preparation Phase

  4. Modeling Phase

  5. Evaluation Phase

  6. Deployment Phase

A breakup of the main CRISP-DM activities in each phase is given below.

I PROBLEM UNDERTSANDING PHASE

setwd("C:/Users/user/Desktop/tarsha systems summer internship/CRISP-DM model")
ash.df <- read.csv(paste("WA_Fn-UseC_-Telco-Customer-Churn.csv", sep=""))
attach(ash.df)
dim(ash.df)
## [1] 7043   21
library(car)
## Loading required package: carData
some(ash.df)
##      customerID gender SeniorCitizen Partner Dependents tenure
## 266  0123-CRBRT Female             0     Yes        Yes     61
## 1154 8777-PVYGU Female             0     Yes         No     52
## 3266 9722-UJOJR   Male             0     Yes        Yes     29
## 3424 2947-DOMLJ   Male             0      No        Yes      1
## 3518 0748-RDGGM   Male             0     Yes         No     70
## 4314 6288-LBEAR Female             0      No         No     23
## 4598 3440-JPSCL Female             0      No         No      6
## 6094 7110-BDTWG Female             0     Yes         No     71
## 6372 1535-VTJOQ Female             0      No         No     24
## 6554 4891-NLUBA Female             0     Yes        Yes     61
##      PhoneService    MultipleLines InternetService      OnlineSecurity
## 266           Yes              Yes             DSL                 Yes
## 1154          Yes               No             DSL                 Yes
## 3266          Yes               No     Fiber optic                  No
## 3424          Yes               No             DSL                 Yes
## 3518          Yes               No     Fiber optic                 Yes
## 4314          Yes               No     Fiber optic                  No
## 4598          Yes               No     Fiber optic                  No
## 6094           No No phone service             DSL                  No
## 6372          Yes              Yes              No No internet service
## 6554           No No phone service             DSL                  No
##             OnlineBackup    DeviceProtection         TechSupport
## 266                  Yes                 Yes                 Yes
## 1154                  No                 Yes                  No
## 3266                  No                  No                  No
## 3424                  No                  No                  No
## 3518                 Yes                 Yes                 Yes
## 4314                 Yes                  No                 Yes
## 4598                  No                 Yes                 Yes
## 6094                  No                 Yes                 Yes
## 6372 No internet service No internet service No internet service
## 6554                 Yes                 Yes                 Yes
##              StreamingTV     StreamingMovies       Contract
## 266                  Yes                 Yes       Two year
## 1154                 Yes                  No       One year
## 3266                  No                  No Month-to-month
## 3424                  No                  No Month-to-month
## 3518                 Yes                 Yes       One year
## 4314                  No                  No Month-to-month
## 4598                 Yes                 Yes Month-to-month
## 6094                  No                 Yes       Two year
## 6372 No internet service No internet service       Two year
## 6554                 Yes                 Yes       Two year
##      PaperlessBilling             PaymentMethod MonthlyCharges
## 266                No              Mailed check          88.10
## 1154              Yes              Mailed check          64.30
## 3266              Yes          Electronic check          70.75
## 3424               No              Mailed check          50.60
## 3518              Yes Bank transfer (automatic)         109.50
## 4314              Yes              Mailed check          79.10
## 4598              Yes              Mailed check          99.95
## 6094              Yes          Electronic check          47.05
## 6372               No   Credit card (automatic)          25.15
## 6554              Yes Bank transfer (automatic)          61.45
##      TotalCharges Churn
## 266       5526.75    No
## 1154      3410.60    No
## 3266      1974.80   Yes
## 3424        50.60   Yes
## 3518      7534.65   Yes
## 4314      1783.75    No
## 4598       547.65   Yes
## 6094      3263.60    No
## 6372       553.00    No
## 6554      3751.15    No
str(ash.df)
## 'data.frame':    7043 obs. of  21 variables:
##  $ customerID      : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
##  $ gender          : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
##  $ SeniorCitizen   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Partner         : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
##  $ Dependents      : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
##  $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
##  $ PhoneService    : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
##  $ MultipleLines   : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
##  $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
##  $ OnlineSecurity  : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
##  $ OnlineBackup    : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
##  $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
##  $ TechSupport     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
##  $ StreamingTV     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
##  $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
##  $ Contract        : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
##  $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
##  $ PaymentMethod   : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
##  $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
##  $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
##  $ Churn           : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...

Problem Statement:

pie(table(ash.df$Churn),main="churn", col=c("skyblue","yellow"))

mytable <- with(ash.df, table(Churn))
prop.table(mytable)*100
## Churn
##       No      Yes 
## 73.46301 26.53699

Which customers are likely to leave?

The obvious guess is Total charges and Monthly charges and tenure. To make sure of this, we plot the corrgram, create correlation matrix and scatterplot matrix , excluding the the less important factor variables , having 2 levels-“Yes” and “No”.

II DATA UNDERSTANDING PHASE

CORRGRAM

library(corrgram)
par(mfrow=c(1,1))
corrgram(ash.df, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of dataset")

These are some of the possible conclusions that can be deduced from the above graph:

SCATTERPLOTMATRIX

library(car)
scatterplotMatrix(~TotalCharges+MonthlyCharges+tenure | Churn, data=ash.df,main="scatterplotmatrix", col=c("red","black"))

CORRELATION MATRIX

den.df <- subset(ash.df[,c(6,19,20)], ash.df$TotalCharges>=0 && ash.df$MonthlyCharges>=0)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
data <- cor(den.df,use = "complete.obs")
round(data,2)
##                tenure MonthlyCharges TotalCharges
## tenure           1.00           0.25         0.83
## MonthlyCharges   0.25           1.00         0.65
## TotalCharges     0.83           0.65         1.00

The correlation matrix also suggests that tenure affects total charges greatly.

1.Demographic variables such as gender and no. of senior citizens:

par(mfrow=c(1,2))
mytable1 <- with(ash.df, table(gender))
barchart(mytable1, col="grey", main="genderwise breakup")

prop.table(mytable1)*100
## gender
##   Female     Male 
## 49.52435 50.47565
mytable <- with(ash.df, table(SeniorCitizen))
barchart(mytable,main="no. of seniorcitizens",col="grey")

prop.table(mytable)*100
## SeniorCitizen
##        0        1 
## 83.78532 16.21468

There are more males than females.

  1. Tenure:
par(mfrow=c(1,1))
d <- density(ash.df$tenure)
plot(d, main="tenure", col="blue")
polygon(d, col="blue", border="turquoise")

Many customers use the company’s service for a shorter period.

  1. Distribution of phone service and multiple lines connection:
par(mfrow=c(1,2))
mytable1 <- with(ash.df, table(PhoneService))
histogram(ash.df$PhoneService, main="phone service", col=c("red", "orange"), xlab="phone service")

prop.table(mytable1)*100
## PhoneService
##        No       Yes 
##  9.683374 90.316626
mytable <- with(ash.df, table(MultipleLines))
histogram(ash.df$MultipleLines, main="multiple lines",col=c("red", "orange", "yellow"),xlab="multiple lines service")

prop.table(mytable)*100
## MultipleLines
##               No No phone service              Yes 
##        48.132898         9.683374        42.183729

Most customers use the company’s phone service and do not have multiple lines.

  1. Distribution of type of internet service:
pie(table(ash.df$InternetService),main="types of internet services", col=c("magenta","violet","pink"))

mytable <- with(ash.df, table(InternetService))
prop.table(mytable)*100
## InternetService
##         DSL Fiber optic          No 
##    34.37456    43.95854    21.66690

Almost 44% customers have fiber optic type of internet service.

  1. Online security and online backup:
par(mfrow=c(1,2))
mytable <- with(ash.df, table(OnlineSecurity))
prop.table(mytable)*100
## OnlineSecurity
##                  No No internet service                 Yes 
##            49.66634            21.66690            28.66676
barchart(mytable, main="online security split up", horizontal=FALSE, xlab="online security", col=c("burlywood", "darkolivegreen"))

mytable1 <- with(ash.df, table(OnlineBackup))
prop.table(mytable1)*100
## OnlineBackup
##                  No No internet service                 Yes 
##            43.84495            21.66690            34.48814
barchart(mytable1, main="online backup split up", horizontal=FALSE, xlab="online backup", col=c("burlywood","darkolivegreen"))

  1. Contract type:
par(mfrow=c(1,1))
mytable <- with(ash.df, table(Contract))
prop.table(mytable)*100
## Contract
## Month-to-month       One year       Two year 
##       55.01917       20.91438       24.06645
barchart(mytable, main="contract", horizontal=FALSE, col=c("skyblue","orange","green"), xlab="type of contract")

55% customers have availed month-to-month contract basis.

  1. Monthly charges and total charges:
par(mfrow=c(1,2))
den.df <- subset(ash.df, ash.df$TotalCharges>=0)
d <- density(den.df$TotalCharges)
plot(d, main="Total charges")
polygon(d, col="red", border="blue")
den1.df <- subset(ash.df, ash.df$MonthlyCharges>=0)
d1 <- density(den1.df$MonthlyCharges)
plot(d1, main="Monthly charges")
polygon(d1, col="red", border="blue")

Monthly charges are more than total charges, which could be reason for customers to leave.

The key drivers are tenure, contract, internet service, total charges and monthly charges.

III DATA PREPARATION PHASE

  1. tenure drives churn?
par(mfrow=c(1,2))
histogram(~ tenure | Churn,col="green", main="tenure with churn")

This shows that customers who tend to leave are people who have been customers for shorter periods.

  1. Role of total charges and monthly charges in retention.
xyplot(TotalCharges ~ MonthlyCharges, data=den.df, main="total charges v/s monthly charges", type = c("p", "g"),xlab="monthly charges", ylab="total charges")

par(mfrow=c(2,1))
library(lattice)
histogram(~ TotalCharges | Churn, data=ash.df)

histogram(~ MonthlyCharges | Churn, data=ash.df, col="orange")

As stated above, due to high monthly charges, churn percentage is more in that case.

  1. Role of contract type
library(vcd)
## Loading required package: grid
par(mfrow=c(1,1))
tab1 <- xtabs(~ Churn + Contract)
tab1
##      Contract
## Churn Month-to-month One year Two year
##   No            2220     1307     1647
##   Yes           1655      166       48
mosaic(tab1, shade=TRUE, legend=TRUE, main="contract basis")

Customers who tend to leave are on a month-to-month contract basis.

  1. role of type of internet service
par(mfrow=c(1,1))
tab2 <- xtabs(~ Churn + InternetService)
tab2
##      InternetService
## Churn  DSL Fiber optic   No
##   No  1962        1799 1413
##   Yes  459        1297  113
mosaic(tab2, shade=TRUE, legend=TRUE, main="internet service effects")

Customers who tend to leave are havinf fiber optic type of internet connection.

  1. total charges v/s internet services
boxplot(TotalCharges ~ InternetService, data=den.df,main="total charges with type of internet service", col=c("blue","red","yellow"),xlab="type of internet service",ylab="total charges")

Total charges are more for fiber optic type of internet service.

  1. tenure v/s contract
boxplot(tenure ~ Contract, data=ash.df,main="tenure v/s contract", col=c("green","orange","pink"),xlab="type of contract",ylab="tenure")

  1. tenure v/s online security
boxplot(tenure ~ OnlineSecurity, data=ash.df,main="tenure v/s online security", col=c("darkcyan","yellowgreen","darkseagreen1"),xlab="online security",ylab="tenure")

Evidently, customers prefer to use the company’s services for a longer time if they are assured online security.

  1. total charges v/s tenure
plot(x=ash.df$tenure, y=ash.df$TotalCharges, main="total charges v/s tenure", col="red", cex=0.6, xlab="tenure", ylab="total charges")

From the above analysis and visualizations, customers who leave tend to be ones who are on a month-to-month contract, have fiber optic internet service, and have been customers for shorter periods.

library(ggplot2)
par(mfrow=c(1,2))
qplot(x = TotalCharges, y = tenure, data = ash.df, color = factor(Churn))
## Warning: Removed 11 rows containing missing values (geom_point).

qplot(x = TotalCharges, y = MonthlyCharges, data = ash.df, color = factor(Churn))
## Warning: Removed 11 rows containing missing values (geom_point).

IV MODELING PHASE

CHI-SQUARED TESTS:

chisq.test(Churn,TotalCharges)
## Warning in chisq.test(Churn, TotalCharges): Chi-squared approximation may
## be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and TotalCharges
## X-squared = 6503.6, df = 6529, p-value = 0.5859
chisq.test(Churn,MonthlyCharges)
## Warning in chisq.test(Churn, MonthlyCharges): Chi-squared approximation may
## be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and MonthlyCharges
## X-squared = 2123.6, df = 1584, p-value < 2.2e-16
chisq.test(Churn,tenure)
## Warning in chisq.test(Churn, tenure): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and tenure
## X-squared = 1065.3, df = 72, p-value < 2.2e-16
chisq.test(Churn,Contract)
## 
##  Pearson's Chi-squared test
## 
## data:  Churn and Contract
## X-squared = 1184.6, df = 2, p-value < 2.2e-16

Hence, since all the p-values are very less (except for totalcharges), our original hypothesis is correct and NULL hypothesis of independence is rejected.

t-Tests NULL hypothesis: There is no difference in total charges for people abandoning and still using the company’s services.

t.test(TotalCharges ~ Churn, data=ash.df)
## 
##  Welch Two Sample t-test
## 
## data:  TotalCharges by Churn
## t = 18.801, df = 4042.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   916.8121 1130.2840
## sample estimates:
##  mean in group No mean in group Yes 
##          2555.344          1531.796

Based on the above output of the t-test, we can reject the hypothesis that the total charges of people leaving and still using the company’s services are the same. (p<0.001)

Regression models:

Simple linear regression (1)

sfit1 <- lm(TotalCharges ~ tenure, data=ash.df)
summary(sfit1)
## 
## Call:
## lm(formula = TotalCharges ~ tenure, data = ash.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3997.2  -416.1   157.1   526.1  3382.8 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -189.529     25.254  -7.505 6.91e-14 ***
## tenure        76.271      0.621 122.810  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1278 on 7030 degrees of freedom
##   (11 observations deleted due to missingness)
## Multiple R-squared:  0.6821, Adjusted R-squared:  0.682 
## F-statistic: 1.508e+04 on 1 and 7030 DF,  p-value: < 2.2e-16
plot(y=TotalCharges, x=tenure, main="total charges v/s tenure", col="orange")
abline(sfit1)

The regression coefficient (76.27) is significantly dfferent from zero (p < 0.001). The multiple R-squared indciates that the model accounts for 68% of the variance in tenure. Although, the residual standard error is quite high for this model to predict total charges. This is not a good model.

Simple linear regression (2)

sfit2 <- lm(MonthlyCharges ~ tenure, data=ash.df)
summary(sfit2)
## 
## Call:
## lm(formula = MonthlyCharges ~ tenure, data = ash.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.498 -27.251   6.245  24.943  54.376 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 54.92978    0.57476   95.57   <2e-16 ***
## tenure       0.30372    0.01415   21.47   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.15 on 7041 degrees of freedom
## Multiple R-squared:  0.06145,    Adjusted R-squared:  0.06132 
## F-statistic:   461 on 1 and 7041 DF,  p-value: < 2.2e-16
plot(y=MonthlyCharges, x=tenure, main="monthly charges v/s tenure", col="red")
abline(sfit2)

The regression coefficient (0.303) is dfferent from zero (p < 0.001). The multiple R-squared indciates that the model accounts for 6% of the variance in tenure. Although, the residual standard error is quite high for this model to predict monthly charges. This is not at all a good model.

Logistic regression

model <- glm(Churn ~ TotalCharges + OnlineSecurity + OnlineBackup + tenure + MonthlyCharges + gender + Contract + PhoneService + MultipleLines + InternetService, family=binomial(link='logit'),data=ash.df)
summary(model)
## 
## Call:
## glm(formula = Churn ~ TotalCharges + OnlineSecurity + OnlineBackup + 
##     tenure + MonthlyCharges + gender + Contract + PhoneService + 
##     MultipleLines + InternetService, family = binomial(link = "logit"), 
##     data = ash.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7072  -0.7073  -0.2926   0.8059   3.4724  
## 
## Coefficients: (3 not defined because of singularities)
##                                     Estimate Std. Error z value Pr(>|z|)
## (Intercept)                        1.733e-01  2.027e-01   0.855  0.39250
## TotalCharges                       3.057e-04  7.022e-05   4.353 1.34e-05
## OnlineSecurityNo internet service -5.870e-01  1.873e-01  -3.133  0.00173
## OnlineSecurityYes                 -5.550e-01  8.490e-02  -6.537 6.29e-11
## OnlineBackupNo internet service           NA         NA      NA       NA
## OnlineBackupYes                   -2.356e-01  7.834e-02  -3.007  0.00264
## tenure                            -5.915e-02  6.135e-03  -9.641  < 2e-16
## MonthlyCharges                     1.075e-02  4.263e-03   2.521  0.01171
## genderMale                        -3.006e-02  6.397e-02  -0.470  0.63839
## ContractOne year                  -8.254e-01  1.056e-01  -7.817 5.39e-15
## ContractTwo year                  -1.682e+00  1.729e-01  -9.726  < 2e-16
## PhoneServiceYes                   -9.549e-01  1.463e-01  -6.527 6.71e-11
## MultipleLinesNo phone service             NA         NA      NA       NA
## MultipleLinesYes                   2.627e-01  8.099e-02   3.244  0.00118
## InternetServiceFiber optic         7.417e-01  1.390e-01   5.338 9.41e-08
## InternetServiceNo                         NA         NA      NA       NA
##                                      
## (Intercept)                          
## TotalCharges                      ***
## OnlineSecurityNo internet service ** 
## OnlineSecurityYes                 ***
## OnlineBackupNo internet service      
## OnlineBackupYes                   ** 
## tenure                            ***
## MonthlyCharges                    *  
## genderMale                           
## ContractOne year                  ***
## ContractTwo year                  ***
## PhoneServiceYes                   ***
## MultipleLinesNo phone service        
## MultipleLinesYes                  ** 
## InternetServiceFiber optic        ***
## InternetServiceNo                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8143.4  on 7031  degrees of freedom
## Residual deviance: 5938.0  on 7019  degrees of freedom
##   (11 observations deleted due to missingness)
## AIC: 5964
## 
## Number of Fisher Scoring iterations: 6

V EVALUATION PHASE

As a last step, we are going to plot the ROC curve and calculate the AUC (area under the curve) which are typical performance measurements for a binary classifier. The ROC(receiving operating characteristic) is a curve generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings while the AUC is the area under the ROC curve. As a rule of thumb, a model with good predictive ability should have an AUC closer to 1 (1 is ideal) than to 0.5.

library(ROCR)
## Loading required package: gplots
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
library(gplots)
fitted.results <- predict(model,newdata=subset(ash.df,select=c(10,20,6,16,7,9)),type='response')
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type =
## ifelse(type == : prediction from a rank-deficient fit may be misleading
pr <- prediction(fitted.results, ash.df$Churn)
prf <- performance(pr, measure = "tpr", x.measure = "fpr")
plot(prf)

auc <- performance(pr, measure = "auc")
auc <- auc@y.values[[1]]
auc
## [1] 0.8418911

The results of the regression model states that it is evident that MonthlyCharges and gender is not signiificant at all. As for the statistically significant variables, tenure has the lowest p-value suggesting a strong association of the period for which the customers used the company’s services with the churn prediction.

Forecasting future values:

library(forecast)
ash1.df <- subset(ash.df[c(6,19,20)])
myts <- ts(ash1.df)
forecast(myts, 3)
## Warning in ets(object, lambda = lambda, allow.multiplicative.trend =
## allow.multiplicative.trend, : Missing values encountered. Using longest
## contiguous portion of time series
## tenure
##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 7044       32.38652 0.9130221 63.86002 -15.74804 80.52109
## 7045       32.38652 0.9130219 63.86002 -15.74804 80.52109
## 7046       32.38652 0.9130218 63.86002 -15.74804 80.52109
## 
## MonthlyCharges
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 7044        64.7517 26.24427 103.2591 5.859675 123.6437
## 7045        64.7517 26.24427 103.2591 5.859675 123.6437
## 7046        64.7517 26.24427 103.2591 5.859674 123.6437
## 
## TotalCharges
##      Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 3332        2254.99 -651.9964 5161.977 -2190.862 6700.843
## 3333        2254.99 -651.9964 5161.977 -2190.862 6700.843
## 3334        2254.99 -651.9965 5161.977 -2190.862 6700.843
plot(forecast(myts, 3))
## Warning in ets(object, lambda = lambda, allow.multiplicative.trend =
## allow.multiplicative.trend, : Missing values encountered. Using longest
## contiguous portion of time series

From the above analysis and visualizations, customers who leave tend to be ones who are on a month-to-month contract, have fiber optic internet service, and have been customers for shorter periods.

VI DEPLOYMENT PHASE