TUGAS ANALISIS FAKTOR

SAYYID RAFIF

Input data

data <- read.csv("~/ANALISIS MULTIVARIAT/telco_churn_clean.csv")
data <- data %>% mutate(across(where(is.character), as.factor))
head(data, 5)
##   customerID Gender SeniorCitizen Partner Dependents Tenure       Contract
## 1 5575-GNVDE   Male             0      No         No     34       One year
## 2 3668-QPYBK   Male             0      No         No      2 Month-to-month
## 3 7795-CFOCW   Male             0      No         No     45       One year
## 4 9237-HQITU Female             0      No         No      2 Month-to-month
## 5 9305-CDSKC Female             0      No         No      8 Month-to-month
##   PaperlessBilling             PaymentMethod MonthlyCharges TotalCharges Churn
## 1               No              Mailed check          56.95      1889.50     0
## 2              Yes              Mailed check          53.85       108.15     1
## 3               No Bank transfer (automatic)          42.30      1840.75     0
## 4              Yes          Electronic check          70.70       151.65     1
## 5              Yes          Electronic check          99.65       820.50     1
##   DataQuota
## 1  2.536710
## 2  3.757840
## 3  5.821249
## 4  2.417958
## 5  2.417978

terdapat 8 variabel bertipe character yang di ubah menjadi faktor dari 13 variabel awal

Eksplorasi

summary(data)
##       customerID      Gender     SeniorCitizen    Partner    Dependents
##  0002-ORFBO:   1   Female:3410   Min.   :0.0000   No :3573   No :4845  
##  0003-MKNFE:   1   Male  :3488   1st Qu.:0.0000   Yes:3325   Yes:2053  
##  0004-TLHLJ:   1                 Median :0.0000                        
##  0011-IGKFF:   1                 Mean   :0.1619                        
##  0013-EXCHZ:   1                 3rd Qu.:0.0000                        
##  0013-MHZWF:   1                 Max.   :1.0000                        
##  (Other)   :6892                                                       
##      Tenure                Contract    PaperlessBilling
##  Min.   : 1.00   Month-to-month:3803   No :2810        
##  1st Qu.: 9.00   One year      :1442   Yes:4088        
##  Median :29.00   Two year      :1653                   
##  Mean   :32.43                                         
##  3rd Qu.:55.00                                         
##  Max.   :72.00                                         
##                                                        
##                    PaymentMethod  MonthlyCharges    TotalCharges   
##  Bank transfer (automatic):1514   Min.   : 18.25   Min.   :  18.8  
##  Credit card (automatic)  :1494   1st Qu.: 35.55   1st Qu.: 401.1  
##  Electronic check         :2309   Median : 70.40   Median :1400.7  
##  Mailed check             :1581   Mean   : 64.84   Mean   :2286.3  
##                                   3rd Qu.: 89.90   3rd Qu.:3807.8  
##                                   Max.   :118.75   Max.   :8684.8  
##                                                                    
##      Churn          DataQuota      
##  Min.   :0.0000   Min.   : 0.5376  
##  1st Qu.:0.0000   1st Qu.: 1.9376  
##  Median :0.0000   Median : 2.7091  
##  Mean   :0.2646   Mean   : 3.0741  
##  3rd Qu.:1.0000   3rd Qu.: 3.7976  
##  Max.   :1.0000   Max.   :19.3583  
## 

Correlation

data_num <- data %>% select(where(is.numeric))
head(data_num,5)
##   SeniorCitizen Tenure MonthlyCharges TotalCharges Churn DataQuota
## 1             0     34          56.95      1889.50     0  2.536710
## 2             0      2          53.85       108.15     1  3.757840
## 3             0     45          42.30      1840.75     0  5.821249
## 4             0      2          70.70       151.65     1  2.417958
## 5             0      8          99.65       820.50     1  2.417978
r = cor(data_num)
corrplot(r, method = "number",type = "lower")

library(PerformanceAnalytics)
chart.Correlation(data_num, histogram=TRUE, pch="+")

KMO

KMOS(data_num)
## 
## Kaiser-Meyer-Olkin Statistics
## 
## Call: KMOS(x = data_num)
## 
## Measures of Sampling Adequacy (MSA):
##  SeniorCitizen         Tenure MonthlyCharges   TotalCharges          Churn 
##      0.7518094      0.4247899      0.3350438      0.4425843      0.6707922 
##      DataQuota 
##      0.4805250 
## 
## KMO-Criterion: 0.4264395

Berdasarkan output di atas, diperoleh statistik KMO-Criterion sebesar \(0.42<0.5\), sehingga dapat disimpulkan analisis faktor tidak dapat diterapkan menggunakan matriks korelasi antar variabel data.

Cek nilai MSA untuk variabel dengan level < 0.5 jika ada, variabel tersebut tidak dapat dianalisis lebih lanjut.

Bartlett’s Test of Sphericity

bart_spher(data_num)
##  Bartlett's Test of Sphericity
## 
## Call: bart_spher(x = data_num)
## 
##      X2 = 18176.736
##      df = 15
## p-value < 2.22e-16

Hipotesis:

\(𝐻_0:R=I\) (Tidak terdapat korelasi yang signifikan antar variabel)

\(𝐻_1:R\neq I\) (Terdapat korelasi yang signifikan antar variabel)

Keputusan: Berdasarkan output di atas, nilai \(p-value<alpha=0.05\), maka tolak \(𝐻_0\)

Kesimpulan: Dengan taraf nyata \(0.05\), dapat disimpulkan bahwa terdapat korelasi yang signifikan antar variabel. Sehingga layak dilanjutkan dengan analisis faktor

Screeplot

korelasi = cor(data_num)
eigen = eigen(korelasi)
screeplot = plot(eigen$values, main = "Scree Plot", xlab = "Faktor", ylab = "Eigen Values", pch = 16, type = "o", col = "green", lwd = 1) + axis(1, at = seq(1,9)) + abline(h=1, col = "red")

Berdasarkan scree plot di atas, terdapat 3 faktor yang memiliki nilai eigen lebih dari 1 sehingga banyak faktor bermakna yang akan diesktrak sebanyak 3 faktor

PCA = principal(r = korelasi, nfactors = 3, rotate = "varimax")
PCA$communality
##  SeniorCitizen         Tenure MonthlyCharges   TotalCharges          Churn 
##      0.4016498      0.8392242      0.7523148      0.9542263      0.6929856 
##      DataQuota 
##      0.9995954

Berdasarkan output di atas, dapat dilihat komunalitas setiap variabel memiliki nilai \(<1\).Hal tersebut mengindikasikan hilangnya informasi sehingga kurang reprensentatif. Oleh karena itu, PCA kurang tepat digunakan sebagai metode ekstraksi faktor pada kasus ini.

PFA = fa(r = data_num, nfactors = 3, rotate = "varimax", fm = "pa")
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An
## ultra-Heywood case was detected.  Examine the results carefully
summary(PFA)
## 
## Factor analysis with Call: fa(r = data_num, nfactors = 3, rotate = "varimax", fm = "pa")
## 
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 0  and the objective function was  0 
## The number of observations was  6898  with Chi Square =  10  with prob <  NA 
## 
## The root mean square of the residuals (RMSA) is  0 
## The df corrected root mean square of the residuals is  NA 
## 
## Tucker Lewis Index of factoring reliability =  -Inf
PFA
## Factor Analysis using method =  pa
## Call: fa(r = data_num, nfactors = 3, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                  PA1   PA2   PA3     h2     u2 com
## SeniorCitizen   0.04  0.35 -0.06 0.1312  0.869 1.1
## Tenure          0.92 -0.03  0.19 0.8857  0.114 1.1
## MonthlyCharges  0.19  0.70  0.52 0.8018  0.198 2.0
## TotalCharges    0.80  0.30  0.55 1.0283 -0.028 2.1
## Churn          -0.34  0.45 -0.10 0.3296  0.670 2.0
## DataQuota      -0.01  0.01 -0.04 0.0014  0.999 1.3
## 
##                        PA1  PA2  PA3
## SS loadings           1.64 0.91 0.63
## Proportion Var        0.27 0.15 0.10
## Cumulative Var        0.27 0.42 0.53
## Proportion Explained  0.52 0.29 0.20
## Cumulative Proportion 0.52 0.80 1.00
## 
## Mean item complexity =  1.6
## Test of the hypothesis that 3 factors are sufficient.
## 
## df null model =  15  with the objective function =  2.64 with Chi Square =  18176.74
## df of  the model are 0  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0 
## The df corrected root mean square of the residuals is  NA 
## 
## The harmonic n.obs is  6898 with the empirical chi square  1.54  with prob <  NA 
## The total n.obs was  6898  with Likelihood Chi Square =  10  with prob <  NA 
## 
## Tucker Lewis Index of factoring reliability =  -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    PA1  PA2  PA3
## Correlation of (regression) scores with factors   0.94 0.79 0.77
## Multiple R square of scores with factors          0.88 0.63 0.60
## Minimum correlation of possible factor scores     0.76 0.26 0.19

Pada output bagian Proportion Variance, faktor \(PA_1\) dapat menjelaskan variansi sebesar \(27%\), faktor \(PA_2\) dapat menjelaskan variansi sebesar \(15%\), dan faktor \(PA_3\) dapat menjelaskan variansi sebesar \(10%\)

Model analisis faktor yang terbentuk sebagai berikut:

\[SeniorCitizen=0.04PA_1+0.35PA_2-0.06PA_3+u_1\] \[Tenure=0.92PA_1-0.03PA_2+0.19PA_3+u_1\] \[MonthlyCharges=0.19PA_1+0.70PA_2+0.52 PA_3+u_1\] \[TotalCharges=0.80PA_1+0.30PA_2+0.55PA_3+u_1\] \[Churn=-0.34PA_1+0.45PA_2-0.10PA_3+u_1\] \[DataQuota=-0.01PA_1+0.01PA_2-0.04PA_3+u_1\]

loads = PFA$loadings
fa.diagram(loads)

round(PFA$loadings[1:6,],4)
##                    PA1     PA2     PA3
## SeniorCitizen   0.0417  0.3547 -0.0602
## Tenure          0.9220 -0.0335  0.1858
## MonthlyCharges  0.1892  0.7005  0.5247
## TotalCharges    0.7959  0.3014  0.5513
## Churn          -0.3438  0.4478 -0.1045
## DataQuota      -0.0120  0.0079 -0.0352