Dimensionality Reduction FA Admission
Input data
data <- read.csv("C:/Users/HP/Downloads/admission.csv")
data$research_exp <- as.factor(data$research_exp)
head(data, 5)## gre_score toefl_score univ_ranking motiv_letter_strength
## 1 337 118 4 4.5
## 2 324 107 4 4.0
## 3 316 104 3 3.0
## 4 322 110 3 3.5
## 5 314 103 2 2.0
## recommendation_strength gpa research_exp admission_score
## 1 4.5 9.65 1 92
## 2 4.5 8.87 1 76
## 3 3.5 8.00 1 72
## 4 2.5 8.67 1 80
## 5 3.0 8.21 0 65
Eksplorasi
## gre_score toefl_score univ_ranking motiv_letter_strength
## Min. :290.0 Min. : 92.0 Min. :1.000 Min. :1.000
## 1st Qu.:308.0 1st Qu.:103.0 1st Qu.:2.000 1st Qu.:2.500
## Median :317.0 Median :107.0 Median :3.000 Median :3.500
## Mean :316.5 Mean :107.2 Mean :3.114 Mean :3.374
## 3rd Qu.:325.0 3rd Qu.:112.0 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :340.0 Max. :120.0 Max. :5.000 Max. :5.000
## recommendation_strength gpa research_exp admission_score
## Min. :1.000 Min. :6.800 0:220 Min. :34.00
## 1st Qu.:3.000 1st Qu.:8.127 1:280 1st Qu.:63.00
## Median :3.500 Median :8.560 Median :72.00
## Mean :3.484 Mean :8.576 Mean :72.14
## 3rd Qu.:4.000 3rd Qu.:9.040 3rd Qu.:82.00
## Max. :5.000 Max. :9.920 Max. :97.00
Correlation
suppressPackageStartupMessages(library(PerformanceAnalytics))
chart.Correlation(data, histogram=TRUE, pch="+")KMO
##
## Kaiser-Meyer-Olkin Statistics
##
## Call: KMOS(x = data)
##
## Measures of Sampling Adequacy (MSA):
## gre_score toefl_score univ_ranking
## 0.9109262 0.9270674 0.9398920
## motiv_letter_strength recommendation_strength gpa
## 0.9160585 0.9362176 0.9053861
## admission_score
## 0.9108130
##
## KMO-Criterion: 0.9193343
Berdasarkan output di atas, diperoleh statistik KMO-Criterion sebesar \(0.91>0.5\), sehingga dapat disimpulkan analisis faktor dapat diterapkan menggunakan matriks korelasi antar variabel data.
Cek nilai MSA untuk variabel dengan level < 0.5 jika ada, variabel tersebut tidak dapat dianalisis lebih lanjut.
Bartlett’s Test of Sphericity
## Bartlett's Test of Sphericity
##
## Call: bart_spher(x = data)
##
## X2 = 3250.552
## df = 21
## p-value < 2.22e-16
Hipotesis: \(𝐻_0:R=I\) (Tidak terdapat korelasi yang signifikan antar variabel) \(𝐻_1:R\neq I\) (Terdapat korelasi yang signifikan antar variabel)
Keputusan: Berdasarkan output di atas, nilai \(p-val<\alpha=0.05\), maka tolak \(𝐻_0\)
Kesimpulan: Dengan taraf nyata \(0.05\), dapat disimpulkan bahwa terdapat korelasi yang signifikan antar variabel. Sehingga layak dilanjutkan dengan analisis faktor
Screeplot
korelasi = cor(data)
eigen = eigen(korelasi)
screeplot = plot(eigen$values, main = "Scree Plot", xlab = "Faktor", ylab = "Eigen Values", pch = 16, type = "o", col = "green", lwd = 1) + axis(1, at = seq(1,9)) + abline(h=1, col = "red")Berdasarkan scree plot di atas, terdapat 1 faktor yang memiliki nilai eigen lebih dari 1 sehingga banyak faktor bermakna yang akan diesktrak sebanyak 1 faktor
## gre_score toefl_score univ_ranking
## 0.7652344 0.7719017 0.6880177
## motiv_letter_strength recommendation_strength gpa
## 0.6945164 0.5724152 0.8648241
## admission_score
## 0.8433593
Berdasarkan output di atas, dapat dilihat komunalitas setiap variabel memiliki nilai \(<1\).Hal tersebut mengindikasikan hilangnya informasi sehingga kurang reprensentatif. Oleh karena itu, PCA kurang tepat digunakan sebagai metode ekstraksi faktor pada kasus ini.
##
## Factor analysis with Call: fa(r = data, nfactors = 1, rotate = "varimax", fm = "pa")
##
## Test of the hypothesis that 1 factor is sufficient.
## The degrees of freedom for the model is 14 and the objective function was 0.45
## The number of observations was 500 with Chi Square = 221.66 with prob < 2e-39
##
## The root mean square of the residuals (RMSA) is 0.05
## The df corrected root mean square of the residuals is 0.06
##
## Tucker Lewis Index of factoring reliability = 0.903
## RMSEA index = 0.172 and the 90 % confidence intervals are 0.153 0.193
## BIC = 134.66
## Factor Analysis using method = pa
## Call: fa(r = data, nfactors = 1, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 h2 u2 com
## gre_score 0.85 0.73 0.27 1
## toefl_score 0.86 0.74 0.26 1
## univ_ranking 0.79 0.62 0.38 1
## motiv_letter_strength 0.79 0.63 0.37 1
## recommendation_strength 0.70 0.49 0.51 1
## gpa 0.93 0.87 0.13 1
## admission_score 0.91 0.84 0.16 1
##
## PA1
## SS loadings 4.92
## Proportion Var 0.70
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## df null model = 21 with the objective function = 6.56 with Chi Square = 3250.55
## df of the model are 14 and the objective function was 0.45
##
## The root mean square of the residuals (RMSR) is 0.05
## The df corrected root mean square of the residuals is 0.06
##
## The harmonic n.obs is 500 with the empirical chi square 58.02 with prob < 2.6e-07
## The total n.obs was 500 with Likelihood Chi Square = 221.66 with prob < 2e-39
##
## Tucker Lewis Index of factoring reliability = 0.903
## RMSEA index = 0.172 and the 90 % confidence intervals are 0.153 0.193
## BIC = 134.66
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## PA1
## Correlation of (regression) scores with factors 0.98
## Multiple R square of scores with factors 0.95
## Minimum correlation of possible factor scores 0.91
Pada output bagian Proportion Variance, faktor \(PA_1\) dapat menjelaskan variansi sebesar \(0.70\), dst.
Model analisis faktor yang terbentuk sebagai berikut: \[gre.score=0.85PA_1+0.7300663h_2+0.2699337u_2\] \[toefl.score=0.86PA_1+0.7373377h_2+0.2626623u_2\] \[univ.ranking=0.79PA_1+0.6244234h_2+0.3755766u_2\] \[motiv.letter.strength=0.79PA_1+0.6318265h_2+0.6318265u_2\] \[recommendation.strength=0.70PA_1+0.4919550h_2+0.5080450u_2\] \[gpa=0.93PA_1+0.8688122h_2+0.1311878u_2\] \[admission.score=0.91PA_1+0.8368171h_2+0.1631829u_2\]
## gre_score toefl_score univ_ranking
## 0.8544 0.8587 0.7902
## motiv_letter_strength recommendation_strength gpa
## 0.7949 0.7014 0.9321
## admission_score
## 0.9148