Predicting Customer Churn: A Comparative Study of Semi-Supervised and One-Class Methods

Author

Saurabh C Srivastava

Published

March 5, 2025

This analysis utilizes the Telco Churn data set, which tracks customer churn based on factors like location, monthly charges, and services. To simulate a semi-supervised learning scenario, I assumed a 95% rate of missing values in the churn indicator and converted it to a binary format. The objective is to compare the performance of semi-supervised learning algorithms to one-class learning algorithms in predicting customer churn.

Exploring Semi-Supervised Machine Learning

This analysis began by loading the necessary R packages and the ‘Telco Customer Churn’ dataset. Exploratory Data Analysis (EDA) was then performed, which included two primary data preparation steps:

  • Imputation, where missing values were replaced using the mice package with mean imputation, followed by verification to ensure successful imputation; and

  • Renaming the dependent variable ‘Churn’ to ‘Class’, removing the ‘customerID’ column, and converting categorical variables to factors to prepare the data for machine learning.

To simulate a semi-supervised learning scenario, 95% of the ‘Class’ labels were masked using the add_missinglabels_mar() function, leaving 5% labeled data for model training. To ensure reproducible results, a seed value was set.

Finally, various semi-supervised and supervised models were benchmarked on the Telco dataset, including Laplacian SVM, Self-Learning SVM with a linear kernel, Self-Learning Nearest Mean Classifier, and a standard Nearest Mean Classifier.

pacman::p_load(scales, tidyverse)
pacman::p_load(RSSL, caret, plyr)
pacman::p_load(lattice, magrittr, useful)
pacman::p_load(MASS, ssc, GGally, e1071, mice)

data_semi = read.csv("Telco-Customer-Churn.csv", header = TRUE)
head(data_semi,3)
  customerID gender SeniorCitizen Partner Dependents tenure PhoneService
1 7590-VHVEG Female             0     Yes         No      1           No
2 5575-GNVDE   Male             0      No         No     34          Yes
3 3668-QPYBK   Male             0      No         No      2          Yes
     MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection
1 No phone service             DSL             No          Yes               No
2               No             DSL            Yes           No              Yes
3               No             DSL            Yes          Yes               No
  TechSupport StreamingTV StreamingMovies       Contract PaperlessBilling
1          No          No              No Month-to-month              Yes
2          No          No              No       One year               No
3          No          No              No Month-to-month              Yes
     PaymentMethod MonthlyCharges TotalCharges Churn
1 Electronic check          29.85        29.85    No
2     Mailed check          56.95      1889.50    No
3     Mailed check          53.85       108.15   Yes
dim(data_semi)
[1] 7043   21
colSums(is.na(data_semi))
      customerID           gender    SeniorCitizen          Partner 
               0                0                0                0 
      Dependents           tenure     PhoneService    MultipleLines 
               0                0                0                0 
 InternetService   OnlineSecurity     OnlineBackup DeviceProtection 
               0                0                0                0 
     TechSupport      StreamingTV  StreamingMovies         Contract 
               0                0                0                0 
PaperlessBilling    PaymentMethod   MonthlyCharges     TotalCharges 
               0                0                0               11 
           Churn 
               0 
impute = mice(data = data_semi, method = "mean")

 iter imp variable
  1   1  TotalCharges
  1   2  TotalCharges
  1   3  TotalCharges
  1   4  TotalCharges
  1   5  TotalCharges
  2   1  TotalCharges
  2   2  TotalCharges
  2   3  TotalCharges
  2   4  TotalCharges
  2   5  TotalCharges
  3   1  TotalCharges
  3   2  TotalCharges
  3   3  TotalCharges
  3   4  TotalCharges
  3   5  TotalCharges
  4   1  TotalCharges
  4   2  TotalCharges
  4   3  TotalCharges
  4   4  TotalCharges
  4   5  TotalCharges
  5   1  TotalCharges
  5   2  TotalCharges
  5   3  TotalCharges
  5   4  TotalCharges
  5   5  TotalCharges
Warning: Number of logged events: 17
data_semi = mice::complete(impute)
colSums(is.na(data_semi))
      customerID           gender    SeniorCitizen          Partner 
               0                0                0                0 
      Dependents           tenure     PhoneService    MultipleLines 
               0                0                0                0 
 InternetService   OnlineSecurity     OnlineBackup DeviceProtection 
               0                0                0                0 
     TechSupport      StreamingTV  StreamingMovies         Contract 
               0                0                0                0 
PaperlessBilling    PaymentMethod   MonthlyCharges     TotalCharges 
               0                0                0                0 
           Churn 
               0 
data_semi %<>% 
  dplyr::rename(Class = Churn) %>% 
  dplyr::select(-customerID) %>% 
  dplyr::mutate(Class = as.factor(Class))

table(data_semi$Class) # Before NA's

  No  Yes 
5174 1869 
data_semi_na_df <- data_semi %>% add_missinglabels_mar(Class~.,prob=0.95) 
colSums(is.na(data_semi_na_df))
           Class           gender    SeniorCitizen          Partner 
            6691                0                0                0 
      Dependents           tenure     PhoneService    MultipleLines 
               0                0                0                0 
 InternetService   OnlineSecurity     OnlineBackup DeviceProtection 
               0                0                0                0 
     TechSupport      StreamingTV  StreamingMovies         Contract 
               0                0                0                0 
PaperlessBilling    PaymentMethod   MonthlyCharges     TotalCharges 
               0                0                0                0 
table(data_semi_na_df$Class) # After NA's

 No Yes 
264  88 
set.seed(12345)
## 1st Model: Laplacian SVM
c_lapsvm <-LaplacianSVM(Class ~ .,data_semi_na_df,
                        scale=FALSE,kernel=kernlab::rbfdot(0.05),
                        lambda = 0.0001, gamma = 10)
        
pred_c_lapsvm = predict(c_lapsvm, data_semi_na_df)

table(pred_c_lapsvm, data_semi$Class)
             
pred_c_lapsvm   No  Yes
          No  5052 1530
          Yes  122  339
accuracy<- mean(pred_c_lapsvm == data_semi$Class)
print(paste("Accuracy on labeled data:", round(accuracy, 4)))
[1] "Accuracy on labeled data: 0.7654"
caret::confusionMatrix(pred_c_lapsvm, as.factor(data_semi$Class))
Confusion Matrix and Statistics

          Reference
Prediction   No  Yes
       No  5052 1530
       Yes  122  339
                                          
               Accuracy : 0.7654          
                 95% CI : (0.7554, 0.7753)
    No Information Rate : 0.7346          
    P-Value [Acc > NIR] : 1.634e-09       
                                          
                  Kappa : 0.2078          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.9764          
            Specificity : 0.1814          
         Pos Pred Value : 0.7675          
         Neg Pred Value : 0.7354          
             Prevalence : 0.7346          
         Detection Rate : 0.7173          
   Detection Prevalence : 0.9345          
      Balanced Accuracy : 0.5789          
                                          
       'Positive' Class : No              
                                          
## 2nd Model: SelfLearning SVMLin
c_slsvm <- SelfLearning(Class ~., data_semi_na_df, 
                        method = svmlin
                        )
        
pred_c_slsvm = predict(c_slsvm, data_semi_na_df)

table(pred_c_slsvm, data_semi$Class)
            
pred_c_slsvm   No  Yes
         No  4654 1019
         Yes  520  850
accuracy<- mean(pred_c_slsvm == data_semi$Class)
print(paste("Accuracy on labeled data:", round(accuracy, 4)))
[1] "Accuracy on labeled data: 0.7815"
caret::confusionMatrix(pred_c_slsvm, as.factor(data_semi$Class))
Confusion Matrix and Statistics

          Reference
Prediction   No  Yes
       No  4654 1019
       Yes  520  850
                                          
               Accuracy : 0.7815          
                 95% CI : (0.7716, 0.7911)
    No Information Rate : 0.7346          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.3873          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.8995          
            Specificity : 0.4548          
         Pos Pred Value : 0.8204          
         Neg Pred Value : 0.6204          
             Prevalence : 0.7346          
         Detection Rate : 0.6608          
   Detection Prevalence : 0.8055          
      Balanced Accuracy : 0.6771          
                                          
       'Positive' Class : No              
                                          
## 3rd Model: SelfLearning Nearest Mean Classifier
c_slnmc <- SelfLearning(Class ~., data_semi_na_df, 
                        method=NearestMeanClassifier
                        )
        
pred_c_slnmc = predict(c_slnmc, data_semi_na_df)

table(pred_c_slnmc, data_semi$Class)
            
pred_c_slnmc   No  Yes
         No  1884  360
         Yes 3290 1509
accuracy<- mean(pred_c_slnmc == data_semi$Class)
print(paste("Accuracy on labeled data:", round(accuracy, 4)))
[1] "Accuracy on labeled data: 0.4818"
caret::confusionMatrix(pred_c_slnmc, as.factor(data_semi$Class))
Confusion Matrix and Statistics

          Reference
Prediction   No  Yes
       No  1884  360
       Yes 3290 1509
                                        
               Accuracy : 0.4818        
                 95% CI : (0.47, 0.4935)
    No Information Rate : 0.7346        
    P-Value [Acc > NIR] : 1             
                                        
                  Kappa : 0.1143        
                                        
 Mcnemar's Test P-Value : <2e-16        
                                        
            Sensitivity : 0.3641        
            Specificity : 0.8074        
         Pos Pred Value : 0.8396        
         Neg Pred Value : 0.3144        
             Prevalence : 0.7346        
         Detection Rate : 0.2675        
   Detection Prevalence : 0.3186        
      Balanced Accuracy : 0.5858        
                                        
       'Positive' Class : No            
                                        
## 4th Model: SVM Without Learning
c_svm <-SVM(Class ~ .,data_semi_na_df, 
            scale=FALSE,
            kernel = kernlab::rbfdot(0.05),
            C = 2500)  
pred_c_svm = predict(c_svm, data_semi_na_df)

table(pred_c_svm)
pred_c_svm
  No  Yes 
6582  461 
table(pred_c_svm, data_semi$Class)
          
pred_c_svm   No  Yes
       No  5052 1530
       Yes  122  339
accuracy<- mean(pred_c_svm == data_semi$Class)
print(paste("Accuracy on labeled data:", round(accuracy, 4)))
[1] "Accuracy on labeled data: 0.7654"
caret::confusionMatrix(pred_c_svm, as.factor(data_semi$Class))
Confusion Matrix and Statistics

          Reference
Prediction   No  Yes
       No  5052 1530
       Yes  122  339
                                          
               Accuracy : 0.7654          
                 95% CI : (0.7554, 0.7753)
    No Information Rate : 0.7346          
    P-Value [Acc > NIR] : 1.634e-09       
                                          
                  Kappa : 0.2078          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.9764          
            Specificity : 0.1814          
         Pos Pred Value : 0.7675          
         Neg Pred Value : 0.7354          
             Prevalence : 0.7346          
         Detection Rate : 0.7173          
   Detection Prevalence : 0.9345          
      Balanced Accuracy : 0.5789          
                                          
       'Positive' Class : No              
                                          

Model Performance Summary

The table below shows the performance evaluation of different semi-supervised learning models:

Model Accuracy (%)
Laplacian SVM 76.13
Self-Learning with Linear SVM 77.25
Nearest Mean Classifier (With Self -Learning) 48.16
Nearest Mean Classifier (Without Self -Learning) 75.93

Exploring One-Class Classification

setwd("/Users/saurabh/Documents/New Haven/Curriculam/Semester 2/Unsupervised Machine Learning/Lecture 6")
        
pacman::p_load(scales, tidyverse)
pacman::p_load(RSSL, caret, plyr)
pacman::p_load(lattice, magrittr, useful)
pacman::p_load(MASS, ssc, GGally, e1071, mice, mltools, data.table)
        
telco_oc_df = read.csv("Telco-Customer-Churn.csv", header = TRUE)
topleft(telco_oc_df)
  customerID gender SeniorCitizen Partner Dependents
1 7590-VHVEG Female             0     Yes         No
2 5575-GNVDE   Male             0      No         No
3 3668-QPYBK   Male             0      No         No
4 7795-CFOCW   Male             0      No         No
5 9237-HQITU Female             0      No         No
telco_oc_df %<>% 
  dplyr::rename(Class = Churn) %>% 
  dplyr::select(-customerID)
        
        
# Convert character columns to factors
char_cols <- c("gender", "Partner", "Dependents", "PhoneService", 
               "MultipleLines", "InternetService", "OnlineSecurity", 
               "OnlineBackup", "DeviceProtection", "TechSupport", 
               "StreamingTV", "StreamingMovies", "Contract", 
               "PaperlessBilling", "PaymentMethod", "Class")
        
telco_oc_df[char_cols] <- lapply(telco_oc_df[char_cols], as.factor)
        
# Convert SeniorCitizen to factor as well.
telco_oc_df$SeniorCitizen <- as.factor(telco_oc_df$SeniorCitizen)
        
telco_oc_df$TotalCharges <- as.numeric(as.character(telco_oc_df$TotalCharges))
        
str(telco_oc_df)
'data.frame':   7043 obs. of  20 variables:
 $ gender          : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
 $ SeniorCitizen   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ Partner         : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
 $ Dependents      : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
 $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
 $ PhoneService    : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
 $ MultipleLines   : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
 $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
 $ OnlineSecurity  : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
 $ OnlineBackup    : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
 $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
 $ TechSupport     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
 $ StreamingTV     : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
 $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
 $ Contract        : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
 $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
 $ PaymentMethod   : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
 $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
 $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
 $ Class           : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...
save_actual = telco_oc_df$Class
table(save_actual)
save_actual
  No  Yes 
5174 1869 
table(telco_oc_df$Class)

  No  Yes 
5174 1869 
# create one-class with no y values
telco_oc_df$Class = ifelse(telco_oc_df$Class == "Yes","N","N")
table(telco_oc_df$Class)

   N 
7043 
head(telco_oc_df)
  gender SeniorCitizen Partner Dependents tenure PhoneService    MultipleLines
1 Female             0     Yes         No      1           No No phone service
2   Male             0      No         No     34          Yes               No
3   Male             0      No         No      2          Yes               No
4   Male             0      No         No     45           No No phone service
5 Female             0      No         No      2          Yes               No
6 Female             0      No         No      8          Yes              Yes
  InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport
1             DSL             No          Yes               No          No
2             DSL            Yes           No              Yes          No
3             DSL            Yes          Yes               No          No
4             DSL            Yes           No              Yes         Yes
5     Fiber optic             No           No               No          No
6     Fiber optic             No           No              Yes          No
  StreamingTV StreamingMovies       Contract PaperlessBilling
1          No              No Month-to-month              Yes
2          No              No       One year               No
3          No              No Month-to-month              Yes
4          No              No       One year               No
5          No              No Month-to-month              Yes
6         Yes             Yes Month-to-month              Yes
              PaymentMethod MonthlyCharges TotalCharges Class
1          Electronic check          29.85        29.85     N
2              Mailed check          56.95      1889.50     N
3              Mailed check          53.85       108.15     N
4 Bank transfer (automatic)          42.30      1840.75     N
5          Electronic check          70.70       151.65     N
6          Electronic check          99.65       820.50     N
dim(telco_oc_df)
[1] 7043   20
colSums(is.na(telco_oc_df))
          gender    SeniorCitizen          Partner       Dependents 
               0                0                0                0 
          tenure     PhoneService    MultipleLines  InternetService 
               0                0                0                0 
  OnlineSecurity     OnlineBackup DeviceProtection      TechSupport 
               0                0                0                0 
     StreamingTV  StreamingMovies         Contract PaperlessBilling 
               0                0                0                0 
   PaymentMethod   MonthlyCharges     TotalCharges            Class 
               0                0               11                0 
impute = mice(data = telco_oc_df, method = "mean")

 iter imp variable
  1   1  TotalCharges
  1   2  TotalCharges
  1   3  TotalCharges
  1   4  TotalCharges
  1   5  TotalCharges
  2   1  TotalCharges
  2   2  TotalCharges
  2   3  TotalCharges
  2   4  TotalCharges
  2   5  TotalCharges
  3   1  TotalCharges
  3   2  TotalCharges
  3   3  TotalCharges
  3   4  TotalCharges
  3   5  TotalCharges
  4   1  TotalCharges
  4   2  TotalCharges
  4   3  TotalCharges
  4   4  TotalCharges
  4   5  TotalCharges
  5   1  TotalCharges
  5   2  TotalCharges
  5   3  TotalCharges
  5   4  TotalCharges
  5   5  TotalCharges
Warning: Number of logged events: 26
complete_df = mice::complete(impute)
head(complete_df)
  gender SeniorCitizen Partner Dependents tenure PhoneService    MultipleLines
1 Female             0     Yes         No      1           No No phone service
2   Male             0      No         No     34          Yes               No
3   Male             0      No         No      2          Yes               No
4   Male             0      No         No     45           No No phone service
5 Female             0      No         No      2          Yes               No
6 Female             0      No         No      8          Yes              Yes
  InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport
1             DSL             No          Yes               No          No
2             DSL            Yes           No              Yes          No
3             DSL            Yes          Yes               No          No
4             DSL            Yes           No              Yes         Yes
5     Fiber optic             No           No               No          No
6     Fiber optic             No           No              Yes          No
  StreamingTV StreamingMovies       Contract PaperlessBilling
1          No              No Month-to-month              Yes
2          No              No       One year               No
3          No              No Month-to-month              Yes
4          No              No       One year               No
5          No              No Month-to-month              Yes
6         Yes             Yes Month-to-month              Yes
              PaymentMethod MonthlyCharges TotalCharges Class
1          Electronic check          29.85        29.85     N
2              Mailed check          56.95      1889.50     N
3              Mailed check          53.85       108.15     N
4 Bank transfer (automatic)          42.30      1840.75     N
5          Electronic check          70.70       151.65     N
6          Electronic check          99.65       820.50     N
colSums(is.na(complete_df))
          gender    SeniorCitizen          Partner       Dependents 
               0                0                0                0 
          tenure     PhoneService    MultipleLines  InternetService 
               0                0                0                0 
  OnlineSecurity     OnlineBackup DeviceProtection      TechSupport 
               0                0                0                0 
     StreamingTV  StreamingMovies         Contract PaperlessBilling 
               0                0                0                0 
   PaymentMethod   MonthlyCharges     TotalCharges            Class 
               0                0                0                0 
complete_df <- mltools::one_hot(as.data.table(complete_df))
str(complete_df)
Classes 'data.table' and 'data.frame':  7043 obs. of  47 variables:
 $ gender_Female                          : int  1 0 0 0 1 1 0 1 1 0 ...
 $ gender_Male                            : int  0 1 1 1 0 0 1 0 0 1 ...
 $ SeniorCitizen_0                        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ SeniorCitizen_1                        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Partner_No                             : int  0 1 1 1 1 1 1 1 0 1 ...
 $ Partner_Yes                            : int  1 0 0 0 0 0 0 0 1 0 ...
 $ Dependents_No                          : int  1 1 1 1 1 1 0 1 1 0 ...
 $ Dependents_Yes                         : int  0 0 0 0 0 0 1 0 0 1 ...
 $ tenure                                 : int  1 34 2 45 2 8 22 10 28 62 ...
 $ PhoneService_No                        : int  1 0 0 1 0 0 0 1 0 0 ...
 $ PhoneService_Yes                       : int  0 1 1 0 1 1 1 0 1 1 ...
 $ MultipleLines_No                       : int  0 1 1 0 1 0 0 0 0 1 ...
 $ MultipleLines_No phone service         : int  1 0 0 1 0 0 0 1 0 0 ...
 $ MultipleLines_Yes                      : int  0 0 0 0 0 1 1 0 1 0 ...
 $ InternetService_DSL                    : int  1 1 1 1 0 0 0 1 0 1 ...
 $ InternetService_Fiber optic            : int  0 0 0 0 1 1 1 0 1 0 ...
 $ InternetService_No                     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ OnlineSecurity_No                      : int  1 0 0 0 1 1 1 0 1 0 ...
 $ OnlineSecurity_No internet service     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ OnlineSecurity_Yes                     : int  0 1 1 1 0 0 0 1 0 1 ...
 $ OnlineBackup_No                        : int  0 1 0 1 1 1 0 1 1 0 ...
 $ OnlineBackup_No internet service       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ OnlineBackup_Yes                       : int  1 0 1 0 0 0 1 0 0 1 ...
 $ DeviceProtection_No                    : int  1 0 1 0 1 0 1 1 0 1 ...
 $ DeviceProtection_No internet service   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ DeviceProtection_Yes                   : int  0 1 0 1 0 1 0 0 1 0 ...
 $ TechSupport_No                         : int  1 1 1 0 1 1 1 1 0 1 ...
 $ TechSupport_No internet service        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ TechSupport_Yes                        : int  0 0 0 1 0 0 0 0 1 0 ...
 $ StreamingTV_No                         : int  1 1 1 1 1 0 0 1 0 1 ...
 $ StreamingTV_No internet service        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ StreamingTV_Yes                        : int  0 0 0 0 0 1 1 0 1 0 ...
 $ StreamingMovies_No                     : int  1 1 1 1 1 0 1 1 0 1 ...
 $ StreamingMovies_No internet service    : int  0 0 0 0 0 0 0 0 0 0 ...
 $ StreamingMovies_Yes                    : int  0 0 0 0 0 1 0 0 1 0 ...
 $ Contract_Month-to-month                : int  1 0 1 0 1 1 1 1 1 0 ...
 $ Contract_One year                      : int  0 1 0 1 0 0 0 0 0 1 ...
 $ Contract_Two year                      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ PaperlessBilling_No                    : int  0 1 0 1 0 0 0 1 0 1 ...
 $ PaperlessBilling_Yes                   : int  1 0 1 0 1 1 1 0 1 0 ...
 $ PaymentMethod_Bank transfer (automatic): int  0 0 0 1 0 0 0 0 0 1 ...
 $ PaymentMethod_Credit card (automatic)  : int  0 0 0 0 0 0 1 0 0 0 ...
 $ PaymentMethod_Electronic check         : int  1 0 0 0 1 1 0 0 1 0 ...
 $ PaymentMethod_Mailed check             : int  0 1 1 0 0 0 0 1 0 0 ...
 $ MonthlyCharges                         : num  29.9 57 53.9 42.3 70.7 ...
 $ TotalCharges                           : num  29.9 1889.5 108.2 1840.8 151.7 ...
 $ Class                                  : chr  "N" "N" "N" "N" ...
 - attr(*, ".internal.selfref")=<externalptr> 
x = complete_df[ , -c("Class")]
y = complete_df$Class

model_svm3 <- svm(x, y = NULL,type = "one-classification", kernel = "linear") 
model_svm3

Call:
svm.default(x = x, y = NULL, type = "one-classification", kernel = "linear")


Parameters:
   SVM-Type:  one-classification 
 SVM-Kernel:  linear 
      gamma:  0.02173913 
         nu:  0.5 

Number of Support Vectors:  3674
summary(model_svm3)

Call:
svm.default(x = x, y = NULL, type = "one-classification", kernel = "linear")


Parameters:
   SVM-Type:  one-classification 
 SVM-Kernel:  linear 
      gamma:  0.02173913 
         nu:  0.5 

Number of Support Vectors:  3674




Number of Classes: 1
svm_predict3 = predict(model_svm3, x, decision.values = TRUE)
table(svm_predict3)
svm_predict3
FALSE  TRUE 
 3514  3529 
svm_predict3 = ifelse(svm_predict3 == "FALSE","Yes","No")
        
table(svm_predict3,save_actual)
            save_actual
svm_predict3   No  Yes
         No  2485 1044
         Yes 2689  825
mean(svm_predict3 == save_actual)
[1] 0.4699702
Model Accuracy (%)
Support Vector Machine (SVM) 47%

Analysis Outcomes and Recommendations

As Self-Learning SVM with a linear kernel demonstrated the highest accuracy among the semi-supervised models, it is the most promising approach for churn prediction in this scenario. Further fine-tuning of this model, including hyperparameter optimization and feature engineering, can be explored to potentially improve its performance. The poor performance of the One-Class SVM necessitates further investigation. Experimentation with different kernel functions and parameter settings, or the use of alternative one-class classification algorithms, is recommended.