Input data
data <- read.csv("~/ANALISIS MULTIVARIAT/telco_churn_clean.csv")
data <- data %>% mutate(across(where(is.character), as.factor))
head(data, 5)
## customerID Gender SeniorCitizen Partner Dependents Tenure Contract
## 1 5575-GNVDE Male 0 No No 34 One year
## 2 3668-QPYBK Male 0 No No 2 Month-to-month
## 3 7795-CFOCW Male 0 No No 45 One year
## 4 9237-HQITU Female 0 No No 2 Month-to-month
## 5 9305-CDSKC Female 0 No No 8 Month-to-month
## PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
## 1 No Mailed check 56.95 1889.50 0
## 2 Yes Mailed check 53.85 108.15 1
## 3 No Bank transfer (automatic) 42.30 1840.75 0
## 4 Yes Electronic check 70.70 151.65 1
## 5 Yes Electronic check 99.65 820.50 1
## DataQuota
## 1 2.536710
## 2 3.757840
## 3 5.821249
## 4 2.417958
## 5 2.417978
terdapat 8 variabel bertipe character yang di ubah menjadi faktor dari 13 variabel awal
Eksplorasi
summary(data)
## customerID Gender SeniorCitizen Partner Dependents
## 0002-ORFBO: 1 Female:3410 Min. :0.0000 No :3573 No :4845
## 0003-MKNFE: 1 Male :3488 1st Qu.:0.0000 Yes:3325 Yes:2053
## 0004-TLHLJ: 1 Median :0.0000
## 0011-IGKFF: 1 Mean :0.1619
## 0013-EXCHZ: 1 3rd Qu.:0.0000
## 0013-MHZWF: 1 Max. :1.0000
## (Other) :6892
## Tenure Contract PaperlessBilling
## Min. : 1.00 Month-to-month:3803 No :2810
## 1st Qu.: 9.00 One year :1442 Yes:4088
## Median :29.00 Two year :1653
## Mean :32.43
## 3rd Qu.:55.00
## Max. :72.00
##
## PaymentMethod MonthlyCharges TotalCharges
## Bank transfer (automatic):1514 Min. : 18.25 Min. : 18.8
## Credit card (automatic) :1494 1st Qu.: 35.55 1st Qu.: 401.1
## Electronic check :2309 Median : 70.40 Median :1400.7
## Mailed check :1581 Mean : 64.84 Mean :2286.3
## 3rd Qu.: 89.90 3rd Qu.:3807.8
## Max. :118.75 Max. :8684.8
##
## Churn DataQuota
## Min. :0.0000 Min. : 0.5376
## 1st Qu.:0.0000 1st Qu.: 1.9376
## Median :0.0000 Median : 2.7091
## Mean :0.2646 Mean : 3.0741
## 3rd Qu.:1.0000 3rd Qu.: 3.7976
## Max. :1.0000 Max. :19.3583
##
Correlation
data_num <- data %>% select(where(is.numeric))
head(data_num,5)
## SeniorCitizen Tenure MonthlyCharges TotalCharges Churn DataQuota
## 1 0 34 56.95 1889.50 0 2.536710
## 2 0 2 53.85 108.15 1 3.757840
## 3 0 45 42.30 1840.75 0 5.821249
## 4 0 2 70.70 151.65 1 2.417958
## 5 0 8 99.65 820.50 1 2.417978
r = cor(data_num)
corrplot(r, method = "number",type = "lower")
library(PerformanceAnalytics)
chart.Correlation(data_num, histogram=TRUE, pch="+")
KMO
KMOS(data_num)
##
## Kaiser-Meyer-Olkin Statistics
##
## Call: KMOS(x = data_num)
##
## Measures of Sampling Adequacy (MSA):
## SeniorCitizen Tenure MonthlyCharges TotalCharges Churn
## 0.7518094 0.4247899 0.3350438 0.4425843 0.6707922
## DataQuota
## 0.4805250
##
## KMO-Criterion: 0.4264395
Berdasarkan output di atas, diperoleh statistik KMO-Criterion sebesar \(0.42<0.5\), sehingga dapat disimpulkan analisis faktor tidak dapat diterapkan menggunakan matriks korelasi antar variabel data.
Cek nilai MSA untuk variabel dengan level < 0.5 jika ada, variabel tersebut tidak dapat dianalisis lebih lanjut.
Bartlett’s Test of Sphericity
bart_spher(data_num)
## Bartlett's Test of Sphericity
##
## Call: bart_spher(x = data_num)
##
## X2 = 18176.736
## df = 15
## p-value < 2.22e-16
Hipotesis:
\(𝐻_0:R=I\) (Tidak terdapat korelasi yang signifikan antar variabel)
\(𝐻_1:R\neq I\) (Terdapat korelasi yang signifikan antar variabel)
Keputusan: Berdasarkan output di atas, nilai \(p-value<alpha=0.05\), maka tolak \(𝐻_0\)
Kesimpulan: Dengan taraf nyata \(0.05\), dapat disimpulkan bahwa terdapat korelasi yang signifikan antar variabel. Sehingga layak dilanjutkan dengan analisis faktor
Screeplot
korelasi = cor(data_num)
eigen = eigen(korelasi)
screeplot = plot(eigen$values, main = "Scree Plot", xlab = "Faktor", ylab = "Eigen Values", pch = 16, type = "o", col = "green", lwd = 1) + axis(1, at = seq(1,9)) + abline(h=1, col = "red")
Berdasarkan scree plot di atas, terdapat 3 faktor yang memiliki nilai eigen lebih dari 1 sehingga banyak faktor bermakna yang akan diesktrak sebanyak 3 faktor
PCA = principal(r = korelasi, nfactors = 3, rotate = "varimax")
PCA$communality
## SeniorCitizen Tenure MonthlyCharges TotalCharges Churn
## 0.4016498 0.8392242 0.7523148 0.9542263 0.6929856
## DataQuota
## 0.9995954
Berdasarkan output di atas, dapat dilihat komunalitas setiap variabel memiliki nilai \(<1\).Hal tersebut mengindikasikan hilangnya informasi sehingga kurang reprensentatif. Oleh karena itu, PCA kurang tepat digunakan sebagai metode ekstraksi faktor pada kasus ini.
PFA = fa(r = data_num, nfactors = 3, rotate = "varimax", fm = "pa")
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected. Examine the results carefully
summary(PFA)
##
## Factor analysis with Call: fa(r = data_num, nfactors = 3, rotate = "varimax", fm = "pa")
##
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 0 and the objective function was 0
## The number of observations was 6898 with Chi Square = 10 with prob < NA
##
## The root mean square of the residuals (RMSA) is 0
## The df corrected root mean square of the residuals is NA
##
## Tucker Lewis Index of factoring reliability = -Inf
PFA
## Factor Analysis using method = pa
## Call: fa(r = data_num, nfactors = 3, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 PA3 h2 u2 com
## SeniorCitizen 0.04 0.35 -0.06 0.1312 0.869 1.1
## Tenure 0.92 -0.03 0.19 0.8857 0.114 1.1
## MonthlyCharges 0.19 0.70 0.52 0.8018 0.198 2.0
## TotalCharges 0.80 0.30 0.55 1.0283 -0.028 2.1
## Churn -0.34 0.45 -0.10 0.3296 0.670 2.0
## DataQuota -0.01 0.01 -0.04 0.0014 0.999 1.3
##
## PA1 PA2 PA3
## SS loadings 1.64 0.91 0.63
## Proportion Var 0.27 0.15 0.10
## Cumulative Var 0.27 0.42 0.53
## Proportion Explained 0.52 0.29 0.20
## Cumulative Proportion 0.52 0.80 1.00
##
## Mean item complexity = 1.6
## Test of the hypothesis that 3 factors are sufficient.
##
## df null model = 15 with the objective function = 2.64 with Chi Square = 18176.74
## df of the model are 0 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is NA
##
## The harmonic n.obs is 6898 with the empirical chi square 1.54 with prob < NA
## The total n.obs was 6898 with Likelihood Chi Square = 10 with prob < NA
##
## Tucker Lewis Index of factoring reliability = -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## PA1 PA2 PA3
## Correlation of (regression) scores with factors 0.94 0.79 0.77
## Multiple R square of scores with factors 0.88 0.63 0.60
## Minimum correlation of possible factor scores 0.76 0.26 0.19
Pada output bagian Proportion Variance, faktor \(PA_1\) dapat menjelaskan variansi sebesar \(27%\), faktor \(PA_2\) dapat menjelaskan variansi sebesar \(15%\), dan faktor \(PA_3\) dapat menjelaskan variansi sebesar \(10%\)
Model analisis faktor yang terbentuk sebagai berikut:
\[SeniorCitizen=0.04PA_1+0.35PA_2-0.06PA_3+u_1\] \[Tenure=0.92PA_1-0.03PA_2+0.19PA_3+u_1\] \[MonthlyCharges=0.19PA_1+0.70PA_2+0.52 PA_3+u_1\] \[TotalCharges=0.80PA_1+0.30PA_2+0.55PA_3+u_1\] \[Churn=-0.34PA_1+0.45PA_2-0.10PA_3+u_1\] \[DataQuota=-0.01PA_1+0.01PA_2-0.04PA_3+u_1\]
loads = PFA$loadings
fa.diagram(loads)
round(PFA$loadings[1:6,],4)
## PA1 PA2 PA3
## SeniorCitizen 0.0417 0.3547 -0.0602
## Tenure 0.9220 -0.0335 0.1858
## MonthlyCharges 0.1892 0.7005 0.5247
## TotalCharges 0.7959 0.3014 0.5513
## Churn -0.3438 0.4478 -0.1045
## DataQuota -0.0120 0.0079 -0.0352