Perusahaan bernama Telco company, ingin mengetahui bagaimana pengaruh pelayanan publik mereka terhadap customer churn atau pelanggan yang berhenti berlangganan. dimana variabel churn sebagai target variabel ( Y ), sementara variabel lain dalam dataset sebagai variabel prediktor nya ( X ). Saya akan melakukan metode pemodelan menggunakan classification model, yaitu salah satu metode yang digunakan untuk prediksi dimana target variabel yang dimiliki adalah kategorik. Yang diperhitungkan dalam classification model adalah peluang seberapa besar masuk kedalam kelas target yang dimiliki. Classification model yang akan saya gunakan antara lain :
Hasil yang akan dipertimbangkan oleh perusahaan dari kedua model diatas adalah nilai recall atau sensitivity yaitu nilai seberapa tepat model menebak benar dari kelas positif (kelas yang diamati adalah Churn “Yes”) dari seluruh data positif. Karena perusahaan ingin membuat suatu keputusan yang akan dilakukan sebelum customer melakukan churn atau berhenti berlangganan.
## 'data.frame': 7043 obs. of 21 variables:
## $ customerID : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
## $ gender : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
## $ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Partner : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
## $ Dependents : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
## $ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
## $ PhoneService : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
## $ MultipleLines : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
## $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
## $ OnlineSecurity : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
## $ OnlineBackup : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
## $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
## $ TechSupport : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
## $ StreamingTV : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
## $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
## $ Contract : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
## $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
## $ PaymentMethod : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
## $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
## $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
## $ Churn : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...
Data ini merupakan data perusahaan Telco dimana customer atau pelanggan melakukan churn atau berhenti berlangganan, yang terdiri dari 7043 customer berlangganan, dimana 20 variabel nya mejelaskan sebagai berikut :
customerID : ID customer
gender : apakah customer merupakan laki-laki atau perempuan
SeniorCitizen : apakah customer merupakan warga lanjut usia atau bukan (1, 0)
Partner : apakah customer memiliki pasangan atau tidak (Yes, No)
Dependents : apakah customer memiliki tanggungan atau tidak (Yes, No)
tenure: berapa banyak bulan customer tetap berlangganan
PhoneService : apakah customer memiliki layanan telepon atau tidak (Yes, No)
MultipleLines : apakah customer memiliki multiple lines atau tidak (Yes, No, No phone service)
InternetService : penyedia layanan internet customer (DSL, Fiber optic, No)
OnlineSecurity : apakah customer memiliki keamanan online atau tidak (Yes, No, No internet service)
OnlineBackup : apakah customer memiliki online backup atau tidak (Yes, No, No internet service)
DeviceProtection : apakah customer memiliki device protection atau tidak (Yes, No, No internet service)
TechSupport : apakah customer memiliki tech support atau tidak (Yes, No, No internet service)
StreamingTV : apakah customer memiliki streaming TV atau tidak (Yes, No, No internet service)
StreamingMovies : apakah customer memiliki streaming movies atau tidak (Yes, No, No internet service)
Contract : penyewaan (Month-to-month, One year, Two year)
PaperlessBilling : apakah customer memiliki paperless billing atau tidak (Yes, No)
PaymentMethod : metode pembayaran customer (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
MonthlyCharges : biaya bulanan customer
TotalCharges : total biaya customer
Churn : apakah customer berhenti berlangganan atau tidak (Yes or No)
## 'data.frame': 7043 obs. of 21 variables:
## $ customerID : Factor w/ 7043 levels "0002-ORFBO","0003-MKNFE",..: 5376 3963 2565 5536 6512 6552 1003 4771 5605 4535 ...
## $ gender : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 1 2 1 1 2 ...
## $ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Partner : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 1 2 1 ...
## $ Dependents : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 2 1 1 2 ...
## $ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
## $ PhoneService : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 1 2 2 ...
## $ MultipleLines : Factor w/ 3 levels "No","No phone service",..: 2 1 1 2 1 3 3 2 3 1 ...
## $ InternetService : Factor w/ 3 levels "DSL","Fiber optic",..: 1 1 1 1 2 2 2 1 2 1 ...
## $ OnlineSecurity : Factor w/ 3 levels "No","No internet service",..: 1 3 3 3 1 1 1 3 1 3 ...
## $ OnlineBackup : Factor w/ 3 levels "No","No internet service",..: 3 1 3 1 1 1 3 1 1 3 ...
## $ DeviceProtection: Factor w/ 3 levels "No","No internet service",..: 1 3 1 3 1 3 1 1 3 1 ...
## $ TechSupport : Factor w/ 3 levels "No","No internet service",..: 1 1 1 3 1 1 1 1 3 1 ...
## $ StreamingTV : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 3 1 3 1 ...
## $ StreamingMovies : Factor w/ 3 levels "No","No internet service",..: 1 1 1 1 1 3 1 1 3 1 ...
## $ Contract : Factor w/ 3 levels "Month-to-month",..: 1 2 1 2 1 1 1 1 1 2 ...
## $ PaperlessBilling: Factor w/ 2 levels "No","Yes": 2 1 2 1 2 2 2 1 2 1 ...
## $ PaymentMethod : Factor w/ 4 levels "Bank transfer (automatic)",..: 3 4 4 1 3 3 2 4 3 1 ...
## $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
## $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
## $ Churn : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 2 1 1 2 1 ...
Cek missing value
## gender SeniorCitizen Partner Dependents
## 0 0 0 0
## tenure PhoneService MultipleLines InternetService
## 0 0 0 0
## OnlineSecurity OnlineBackup DeviceProtection TechSupport
## 0 0 0 0
## StreamingTV StreamingMovies Contract PaperlessBilling
## 0 0 0 0
## PaymentMethod MonthlyCharges TotalCharges Churn
## 0 0 11 0
## [1] FALSE
## gender SeniorCitizen Partner Dependents
## 0 0 0 0
## tenure PhoneService MultipleLines InternetService
## 0 0 0 0
## OnlineSecurity OnlineBackup DeviceProtection TechSupport
## 0 0 0 0
## StreamingTV StreamingMovies Contract PaperlessBilling
## 0 0 0 0
## PaymentMethod MonthlyCharges TotalCharges Churn
## 0 0 0 0
Cek proporsi kelas target
##
## No Yes
## 0.7346301 0.2653699
Split data menjadi data train dan data test,
set.seed(100)
index <- sample(nrow(telco), 0.8*nrow(telco))
telco_train <- telco[index,]
telco_test <- telco[-index,]Cek balancing data train
##
## No Yes
## 0.7362442 0.2637558
Cek balancing data test
##
## No Yes
## 0.728176 0.271824
Downsampling data train
Cek proporsi data train
##
## No Yes
## 0.5 0.5
model_telco <- glm(Churn ~ gender + SeniorCitizen + Partner + Dependents + tenure + PhoneService + InternetService + Contract + MonthlyCharges + TotalCharges + PaymentMethod , telco_train_2, family = "binomial")
summary(model_telco)##
## Call:
## glm(formula = Churn ~ gender + SeniorCitizen + Partner + Dependents +
## tenure + PhoneService + InternetService + Contract + MonthlyCharges +
## TotalCharges + PaymentMethod, family = "binomial", data = telco_train_2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1256 -0.7605 0.1851 0.7542 2.9949
##
## Coefficients:
## Estimate Std. Error z value
## (Intercept) 1.04473980 0.32060723 3.259
## genderMale 0.06350581 0.09132830 0.695
## SeniorCitizen 0.26938872 0.12098609 2.227
## PartnerYes -0.04460717 0.10816879 -0.412
## DependentsYes -0.19998643 0.12182312 -1.642
## tenure -0.05278971 0.00758485 -6.960
## PhoneServiceYes -0.78465642 0.20630064 -3.803
## InternetServiceFiber optic 0.96905094 0.19587281 4.947
## InternetServiceNo -0.27472308 0.25841015 -1.063
## ContractOne year -0.85499768 0.14026577 -6.096
## ContractTwo year -1.72918198 0.21284906 -8.124
## MonthlyCharges 0.00288432 0.00587100 0.491
## TotalCharges 0.00028394 0.00008801 3.226
## PaymentMethodCredit card (automatic) -0.24167290 0.15282496 -1.581
## PaymentMethodElectronic check 0.40287014 0.13216294 3.048
## PaymentMethodMailed check -0.01964534 0.16004289 -0.123
## Pr(>|z|)
## (Intercept) 0.001120 **
## genderMale 0.486831
## SeniorCitizen 0.025973 *
## PartnerYes 0.680057
## DependentsYes 0.100670
## tenure 0.000000000003405452 ***
## PhoneServiceYes 0.000143 ***
## InternetServiceFiber optic 0.000000752314197564 ***
## InternetServiceNo 0.287724
## ContractOne year 0.000000001090584195 ***
## ContractTwo year 0.000000000000000451 ***
## MonthlyCharges 0.623226
## TotalCharges 0.001255 **
## PaymentMethodCredit card (automatic) 0.113793
## PaymentMethodElectronic check 0.002302 **
## PaymentMethodMailed check 0.902305
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4120.1 on 2971 degrees of freedom
## Residual deviance: 2909.9 on 2956 degrees of freedom
## AIC: 2941.9
##
## Number of Fisher Scoring iterations: 5
## [1] 0.9441592
dari nilai odds ratio tersebut dapat dikatakan bahwa : customer yang memiliki nilai tenure lebih tinggi memiliki kemungkinan untuk Churn 1 KALI lebih mungkin dibandingkan dengan customer yang memiliki nilai tenure lebih rendah.
Prediksi
## gender SeniorCitizen Partner Dependents tenure PhoneService
## 3 Male 0 No No 2 Yes
## 8 Female 0 No No 10 No
## 11 Male 0 Yes Yes 13 Yes
## 24 Female 0 Yes No 58 Yes
## 36 Female 0 Yes Yes 72 Yes
## 43 Female 0 Yes Yes 17 Yes
## MultipleLines InternetService OnlineSecurity
## 3 No DSL Yes
## 8 No phone service DSL Yes
## 11 No DSL Yes
## 24 Yes DSL No
## 36 Yes Fiber optic Yes
## 43 No No No internet service
## OnlineBackup DeviceProtection TechSupport
## 3 Yes No No
## 8 No No No
## 11 No No No
## 24 Yes No Yes
## 36 Yes No Yes
## 43 No internet service No internet service No internet service
## StreamingTV StreamingMovies Contract PaperlessBilling
## 3 No No Month-to-month Yes
## 8 No No Month-to-month No
## 11 No No Month-to-month Yes
## 24 No No Two year Yes
## 36 Yes No Two year No
## 43 No internet service No internet service One year No
## PaymentMethod MonthlyCharges TotalCharges Churn prob_Churn
## 3 Mailed check 53.85 108.15 Yes 0.59493216
## 8 Mailed check 29.75 301.90 No 0.66121784
## 11 Mailed check 49.95 587.45 No 0.42163962
## 24 Credit card (automatic) 59.90 3505.10 No 0.02535274
## 36 Bank transfer (automatic) 99.90 7251.70 No 0.09989944
## 43 Mailed check 20.75 418.25 No 0.13555670
menentukan threshold dengan melihat distribusi peluang churn
coba untuk menggunakan threshold 0.5
telco_test$pred_Churn <- as.factor(ifelse(telco_test$prob_Churn > 0.5 , "Yes", "No"))
head(telco_test$pred_Churn)## [1] Yes Yes No No No No
## Levels: No Yes
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 744 73
## Yes 282 310
##
## Accuracy : 0.748
## 95% CI : (0.7245, 0.7705)
## No Information Rate : 0.7282
## P-Value [Acc > NIR] : 0.04897
##
## Kappa : 0.4565
##
## Mcnemar's Test P-Value : < 0.0000000000000002
##
## Sensitivity : 0.8094
## Specificity : 0.7251
## Pos Pred Value : 0.5236
## Neg Pred Value : 0.9106
## Prevalence : 0.2718
## Detection Rate : 0.2200
## Detection Prevalence : 0.4202
## Balanced Accuracy : 0.7673
##
## 'Positive' Class : Yes
##
Nilai recall yang diperoleh adalah 0.78
K-Nearest Neighbor adalah metode klasifikasi berdasarkan tetangga terdekat. KNN akan mengklasifikasikan kedalam kelas major berdasarkan jarak terdekat dari tetangganya.
## gender SeniorCitizen Partner Dependents tenure PhoneService
## 1 Female 0 Yes No 1 No
## 2 Male 0 No No 34 Yes
## 3 Male 0 No No 2 Yes
## 4 Male 0 No No 45 No
## 5 Female 0 No No 2 Yes
## 6 Female 0 No No 8 Yes
## MultipleLines InternetService OnlineSecurity OnlineBackup
## 1 No phone service DSL No Yes
## 2 No DSL Yes No
## 3 No DSL Yes Yes
## 4 No phone service DSL Yes No
## 5 No Fiber optic No No
## 6 Yes Fiber optic No No
## DeviceProtection TechSupport StreamingTV StreamingMovies Contract
## 1 No No No No Month-to-month
## 2 Yes No No No One year
## 3 No No No No Month-to-month
## 4 Yes Yes No No One year
## 5 No No No No Month-to-month
## 6 Yes No Yes Yes Month-to-month
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
## 1 Yes Electronic check 29.85 29.85
## 2 No Mailed check 56.95 1889.50
## 3 Yes Mailed check 53.85 108.15
## 4 No Bank transfer (automatic) 42.30 1840.75
## 5 Yes Electronic check 70.70 151.65
## 6 Yes Electronic check 99.65 820.50
## Churn
## 1 No
## 2 No
## 3 Yes
## 4 No
## 5 Yes
## 6 Yes
## gender SeniorCitizen Partner Dependents
## 0 0 0 0
## tenure PhoneService MultipleLines InternetService
## 0 0 0 0
## OnlineSecurity OnlineBackup DeviceProtection TechSupport
## 0 0 0 0
## StreamingTV StreamingMovies Contract PaperlessBilling
## 0 0 0 0
## PaymentMethod MonthlyCharges TotalCharges Churn
## 0 0 0 0
Jika terdapat data yang memiliki tipe factor akan diubah menjadi dummy variabel. Dummy variabel adalah variabel bentukan dari variabel yang berisi data kategorik, dimana variabel dummy yang akan terbentuk adalah sebanyak n-1 variabel. n disini adalah banyaknya kategori dalam variabel kategorik tersebut.
dummy <- dummyVars(~gender + SeniorCitizen + Partner + Dependents + tenure + PhoneService + InternetService + Contract + MonthlyCharges + TotalCharges + PaymentMethod, telco_2, fullRank = T)
class(dummy)## [1] "dummyVars"
mengubah dummy menjadi data frame
## [1] "data.frame"
menggabungkan data dummy.df dengan variabel Churn
## Churn gender.Male SeniorCitizen Partner.Yes Dependents.Yes tenure
## 1 No 0 0 1 0 1
## 2 No 1 0 0 0 34
## 3 Yes 1 0 0 0 2
## 4 No 1 0 0 0 45
## 5 Yes 0 0 0 0 2
## 6 Yes 0 0 0 0 8
## PhoneService.Yes InternetService.Fiber.optic InternetService.No
## 1 0 0 0
## 2 1 0 0
## 3 1 0 0
## 4 0 0 0
## 5 1 1 0
## 6 1 1 0
## Contract.One.year Contract.Two.year MonthlyCharges TotalCharges
## 1 0 0 29.85 29.85
## 2 1 0 56.95 1889.50
## 3 0 0 53.85 108.15
## 4 1 0 42.30 1840.75
## 5 0 0 70.70 151.65
## 6 0 0 99.65 820.50
## PaymentMethod.Credit.card..automatic. PaymentMethod.Electronic.check
## 1 0 1
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 1
## 6 0 1
## PaymentMethod.Mailed.check
## 1 0
## 2 1
## 3 1
## 4 0
## 5 0
## 6 0
## Observations: 7,043
## Variables: 16
## $ Churn <fct> No, No, Yes, No, Yes, Ye...
## $ gender.Male <dbl> 0, 1, 1, 1, 0, 0, 1, 0, ...
## $ SeniorCitizen <dbl> 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ Partner.Yes <dbl> 1, 0, 0, 0, 0, 0, 0, 0, ...
## $ Dependents.Yes <dbl> 0, 0, 0, 0, 0, 0, 1, 0, ...
## $ tenure <dbl> 1, 34, 2, 45, 2, 8, 22, ...
## $ PhoneService.Yes <dbl> 0, 1, 1, 0, 1, 1, 1, 0, ...
## $ InternetService.Fiber.optic <dbl> 0, 0, 0, 0, 1, 1, 1, 0, ...
## $ InternetService.No <dbl> 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ Contract.One.year <dbl> 0, 1, 0, 1, 0, 0, 0, 0, ...
## $ Contract.Two.year <dbl> 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ MonthlyCharges <dbl> 29.85, 56.95, 53.85, 42....
## $ TotalCharges <dbl> 29.85, 1889.50, 108.15, ...
## $ PaymentMethod.Credit.card..automatic. <dbl> 0, 0, 0, 0, 0, 0, 1, 0, ...
## $ PaymentMethod.Electronic.check <dbl> 1, 0, 0, 0, 1, 1, 0, 0, ...
## $ PaymentMethod.Mailed.check <dbl> 0, 1, 1, 0, 0, 0, 0, 1, ...
set.seed(123)
index <- sample(nrow(telco_2_new_n), 0.8*nrow(telco_2_new_n))
knn_churn_train <- telco_2_new_n[index,]
knn_churn_test <- telco_2_new_n[-index,]## Observations: 5,634
## Variables: 16
## $ Churn <fct> No, Yes, No, No, Yes, No...
## $ gender.Male <dbl[,1]> <matrix[25 x 1]>
## $ SeniorCitizen <dbl[,1]> <matrix[25 x 1]>
## $ Partner.Yes <dbl[,1]> <matrix[25 x 1]>
## $ Dependents.Yes <dbl[,1]> <matrix[25 x 1]>
## $ tenure <dbl[,1]> <matrix[25 x 1]>
## $ PhoneService.Yes <dbl[,1]> <matrix[25 x 1]>
## $ InternetService.Fiber.optic <dbl[,1]> <matrix[25 x 1]>
## $ InternetService.No <dbl[,1]> <matrix[25 x 1]>
## $ Contract.One.year <dbl[,1]> <matrix[25 x 1]>
## $ Contract.Two.year <dbl[,1]> <matrix[25 x 1]>
## $ MonthlyCharges <dbl[,1]> <matrix[25 x 1]>
## $ TotalCharges <dbl[,1]> <matrix[25 x 1]>
## $ PaymentMethod.Credit.card..automatic. <dbl[,1]> <matrix[25 x 1]>
## $ PaymentMethod.Electronic.check <dbl[,1]> <matrix[25 x 1]>
## $ PaymentMethod.Mailed.check <dbl[,1]> <matrix[25 x 1]>
cek balancing data train
##
## No Yes
## 0.7305644 0.2694356
balancing target variabel menggunakan upsample
knn_churn_train_n <- upSample(x = knn_churn_train[, -1], y = knn_churn_train[,1], yname = "Churn")
str(knn_churn_train_n)## 'data.frame': 8232 obs. of 16 variables:
## $ gender.Male : num 0.99 -1.01 0.99 -1.01 -1.01 ...
## $ SeniorCitizen : num -0.44 -0.44 -0.44 -0.44 -0.44 ...
## $ Partner.Yes : num -0.967 1.034 -0.967 -0.967 1.034 ...
## $ Dependents.Yes : num 1.529 1.529 -0.654 -0.654 -0.654 ...
## $ tenure : num -1.277 -0.382 0.799 0.107 0.962 ...
## $ PhoneService.Yes : num 0.327 -3.054 0.327 0.327 0.327 ...
## $ InternetService.Fiber.optic : num -0.886 -0.886 1.129 1.129 -0.886 ...
## $ InternetService.No : num -0.526 -0.526 -0.526 -0.526 1.901 ...
## $ Contract.One.year : num -0.514 -0.514 1.944 -0.514 1.944 ...
## $ Contract.Two.year : num -0.563 1.776 -0.563 1.776 -0.563 ...
## $ MonthlyCharges : num 0.189 -1.002 0.88 1.61 -1.345 ...
## $ TotalCharges : num -0.975 -0.647 1.085 0.721 -0.449 ...
## $ PaymentMethod.Credit.card..automatic.: num -0.525 -0.525 -0.525 -0.525 -0.525 ...
## $ PaymentMethod.Electronic.check : num -0.711 -0.711 1.406 -0.711 1.406 ...
## $ PaymentMethod.Mailed.check : num -0.545 1.835 -0.545 1.835 -0.545 ...
## $ Churn : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
cek balancing target variabel data train
##
## No Yes
## 0.5 0.5
## [1] 90.73037
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 723 59
## Yes 335 292
##
## Accuracy : 0.7204
## 95% CI : (0.6961, 0.7437)
## No Information Rate : 0.7509
## P-Value [Acc > NIR] : 0.996
##
## Kappa : 0.4081
##
## Mcnemar's Test P-Value : <0.0000000000000002
##
## Sensitivity : 0.8319
## Specificity : 0.6834
## Pos Pred Value : 0.4657
## Neg Pred Value : 0.9246
## Prevalence : 0.2491
## Detection Rate : 0.2072
## Detection Prevalence : 0.4450
## Balanced Accuracy : 0.7576
##
## 'Positive' Class : Yes
##
Nilai recall yang diperoleh adalah 0.84
## gender SeniorCitizen Partner Dependents tenure PhoneService
## 1 Female 0 Yes No 1 No
## 2 Male 0 No No 34 Yes
## 3 Male 0 No No 2 Yes
## 4 Male 0 No No 45 No
## 5 Female 0 No No 2 Yes
## 6 Female 0 No No 8 Yes
## MultipleLines InternetService OnlineSecurity OnlineBackup
## 1 No phone service DSL No Yes
## 2 No DSL Yes No
## 3 No DSL Yes Yes
## 4 No phone service DSL Yes No
## 5 No Fiber optic No No
## 6 Yes Fiber optic No No
## DeviceProtection TechSupport StreamingTV StreamingMovies Contract
## 1 No No No No Month-to-month
## 2 Yes No No No One year
## 3 No No No No Month-to-month
## 4 Yes Yes No No One year
## 5 No No No No Month-to-month
## 6 Yes No Yes Yes Month-to-month
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges
## 1 Yes Electronic check 29.85 29.85
## 2 No Mailed check 56.95 1889.50
## 3 Yes Mailed check 53.85 108.15
## 4 No Bank transfer (automatic) 42.30 1840.75
## 5 Yes Electronic check 70.70 151.65
## 6 Yes Electronic check 99.65 820.50
## Churn
## 1 No
## 2 No
## 3 Yes
## 4 No
## 5 Yes
## 6 Yes
Tanpa menggunakan variable target
set.seed(123)
index_nc <- sample(nrow(telco_noncat_new), 0.8* nrow(telco_noncat_new))
knn_train_nc <- telco_noncat_new[index_nc,]
knn_test_nc <- telco_noncat_new[-index_nc,]balancing target variable
knn_train_nc_n <- downSample(x = knn_train_nc[, -4], y = knn_train_nc[,4], yname = "Churn")
str(knn_train_nc_n)## 'data.frame': 3036 obs. of 4 variables:
## $ tenure : num -0.626 -1.277 0.718 -0.707 0.188 ...
## $ MonthlyCharges: num 0.201 -1.503 0.619 1.349 -1.493 ...
## $ TotalCharges : num -0.473 -0.997 0.809 -0.318 -0.689 ...
## $ Churn : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
cek proporsi target variabel
##
## No Yes
## 0.5 0.5
## [1] 55.09991
k = 55
pred_knn_nc <- knn(train = knn_train_nc_n[,-4], cl = knn_train_nc_n[,4], test = knn_test_nc[,-4], k = 55)
table(pred_knn_nc)## pred_knn_nc
## No Yes
## 830 579
## Confusion Matrix and Statistics
##
## Reference
## Prediction No Yes
## No 755 75
## Yes 303 276
##
## Accuracy : 0.7317
## 95% CI : (0.7078, 0.7547)
## No Information Rate : 0.7509
## P-Value [Acc > NIR] : 0.954
##
## Kappa : 0.4108
##
## Mcnemar's Test P-Value : <0.0000000000000002
##
## Sensitivity : 0.7863
## Specificity : 0.7136
## Pos Pred Value : 0.4767
## Neg Pred Value : 0.9096
## Prevalence : 0.2491
## Detection Rate : 0.1959
## Detection Prevalence : 0.4109
## Balanced Accuracy : 0.7500
##
## 'Positive' Class : Yes
##
Nilai Recall yang diperoleh adalah 0.80
Hasil nilai recall :
Maka, dapat disimpulkan bahwa :
Nilai Recall yang dihasilkan pada model KNN lebih tinggi daripada model Logistic Regression.
Dengan menggunakan model KNN, jika memiliki variabel kategorik akan menghasilkan nilai recall yang lebih tinggi dibandingkan dengan tanpa variabel kategorik.
Sehingga keputusan yang dapat dilakukan oleh perusahaan Telco selanjutnya adalah :