Dimensionality Reduction FA Admission

Input data

data <- read.csv("C:/Users/HP/Downloads/admission.csv")
data$research_exp <- as.factor(data$research_exp)
head(data, 5)
##   gre_score toefl_score univ_ranking motiv_letter_strength
## 1       337         118            4                   4.5
## 2       324         107            4                   4.0
## 3       316         104            3                   3.0
## 4       322         110            3                   3.5
## 5       314         103            2                   2.0
##   recommendation_strength  gpa research_exp admission_score
## 1                     4.5 9.65            1              92
## 2                     4.5 8.87            1              76
## 3                     3.5 8.00            1              72
## 4                     2.5 8.67            1              80
## 5                     3.0 8.21            0              65

Eksplorasi

summary(data)
##    gre_score      toefl_score     univ_ranking   motiv_letter_strength
##  Min.   :290.0   Min.   : 92.0   Min.   :1.000   Min.   :1.000        
##  1st Qu.:308.0   1st Qu.:103.0   1st Qu.:2.000   1st Qu.:2.500        
##  Median :317.0   Median :107.0   Median :3.000   Median :3.500        
##  Mean   :316.5   Mean   :107.2   Mean   :3.114   Mean   :3.374        
##  3rd Qu.:325.0   3rd Qu.:112.0   3rd Qu.:4.000   3rd Qu.:4.000        
##  Max.   :340.0   Max.   :120.0   Max.   :5.000   Max.   :5.000        
##  recommendation_strength      gpa        research_exp admission_score
##  Min.   :1.000           Min.   :6.800   0:220        Min.   :34.00  
##  1st Qu.:3.000           1st Qu.:8.127   1:280        1st Qu.:63.00  
##  Median :3.500           Median :8.560                Median :72.00  
##  Mean   :3.484           Mean   :8.576                Mean   :72.14  
##  3rd Qu.:4.000           3rd Qu.:9.040                3rd Qu.:82.00  
##  Max.   :5.000           Max.   :9.920                Max.   :97.00

Correlation

data <- data[, -7]
r = cor(data)
corrplot(r, method = "number",type = "lower")

suppressPackageStartupMessages(library(PerformanceAnalytics))
chart.Correlation(data, histogram=TRUE, pch="+")

KMO

KMOS(data)
## 
## Kaiser-Meyer-Olkin Statistics
## 
## Call: KMOS(x = data)
## 
## Measures of Sampling Adequacy (MSA):
##               gre_score             toefl_score            univ_ranking 
##               0.9109262               0.9270674               0.9398920 
##   motiv_letter_strength recommendation_strength                     gpa 
##               0.9160585               0.9362176               0.9053861 
##         admission_score 
##               0.9108130 
## 
## KMO-Criterion: 0.9193343

Berdasarkan output di atas, diperoleh statistik KMO-Criterion sebesar \(0.91>0.5\), sehingga dapat disimpulkan analisis faktor dapat diterapkan menggunakan matriks korelasi antar variabel data.

Cek nilai MSA untuk variabel dengan level < 0.5 jika ada, variabel tersebut tidak dapat dianalisis lebih lanjut.

Bartlett’s Test of Sphericity

bart_spher(data)
##  Bartlett's Test of Sphericity
## 
## Call: bart_spher(x = data)
## 
##      X2 = 3250.552
##      df = 21
## p-value < 2.22e-16

Hipotesis: \(𝐻_0:R=I\) (Tidak terdapat korelasi yang signifikan antar variabel) \(𝐻_1:R\neq I\) (Terdapat korelasi yang signifikan antar variabel)

Keputusan: Berdasarkan output di atas, nilai \(p-val<\alpha=0.05\), maka tolak \(𝐻_0\)

Kesimpulan: Dengan taraf nyata \(0.05\), dapat disimpulkan bahwa terdapat korelasi yang signifikan antar variabel. Sehingga layak dilanjutkan dengan analisis faktor

Screeplot

korelasi = cor(data)
eigen = eigen(korelasi)
screeplot = plot(eigen$values, main = "Scree Plot", xlab = "Faktor", ylab = "Eigen Values", pch = 16, type = "o", col = "green", lwd = 1) + axis(1, at = seq(1,9)) + abline(h=1, col = "red")

Berdasarkan scree plot di atas, terdapat 1 faktor yang memiliki nilai eigen lebih dari 1 sehingga banyak faktor bermakna yang akan diesktrak sebanyak 1 faktor

PCA = principal(r = korelasi, nfactors = 1, rotate = "varimax")
PCA$communality
##               gre_score             toefl_score            univ_ranking 
##               0.7652344               0.7719017               0.6880177 
##   motiv_letter_strength recommendation_strength                     gpa 
##               0.6945164               0.5724152               0.8648241 
##         admission_score 
##               0.8433593

Berdasarkan output di atas, dapat dilihat komunalitas setiap variabel memiliki nilai \(<1\).Hal tersebut mengindikasikan hilangnya informasi sehingga kurang reprensentatif. Oleh karena itu, PCA kurang tepat digunakan sebagai metode ekstraksi faktor pada kasus ini.

PFA = fa(r = data, nfactors = 1, rotate = "varimax", fm = "pa")
summary(PFA)
## 
## Factor analysis with Call: fa(r = data, nfactors = 1, rotate = "varimax", fm = "pa")
## 
## Test of the hypothesis that 1 factor is sufficient.
## The degrees of freedom for the model is 14  and the objective function was  0.45 
## The number of observations was  500  with Chi Square =  221.66  with prob <  2e-39 
## 
## The root mean square of the residuals (RMSA) is  0.05 
## The df corrected root mean square of the residuals is  0.06 
## 
## Tucker Lewis Index of factoring reliability =  0.903
## RMSEA index =  0.172  and the 90 % confidence intervals are  0.153 0.193
## BIC =  134.66
PFA
## Factor Analysis using method =  pa
## Call: fa(r = data, nfactors = 1, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                          PA1   h2   u2 com
## gre_score               0.85 0.73 0.27   1
## toefl_score             0.86 0.74 0.26   1
## univ_ranking            0.79 0.62 0.38   1
## motiv_letter_strength   0.79 0.63 0.37   1
## recommendation_strength 0.70 0.49 0.51   1
## gpa                     0.93 0.87 0.13   1
## admission_score         0.91 0.84 0.16   1
## 
##                 PA1
## SS loadings    4.92
## Proportion Var 0.70
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## df null model =  21  with the objective function =  6.56 with Chi Square =  3250.55
## df of  the model are 14  and the objective function was  0.45 
## 
## The root mean square of the residuals (RMSR) is  0.05 
## The df corrected root mean square of the residuals is  0.06 
## 
## The harmonic n.obs is  500 with the empirical chi square  58.02  with prob <  2.6e-07 
## The total n.obs was  500  with Likelihood Chi Square =  221.66  with prob <  2e-39 
## 
## Tucker Lewis Index of factoring reliability =  0.903
## RMSEA index =  0.172  and the 90 % confidence intervals are  0.153 0.193
## BIC =  134.66
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    PA1
## Correlation of (regression) scores with factors   0.98
## Multiple R square of scores with factors          0.95
## Minimum correlation of possible factor scores     0.91

Pada output bagian Proportion Variance, faktor \(PA_1\) dapat menjelaskan variansi sebesar \(0.70\), dst.

Model analisis faktor yang terbentuk sebagai berikut: \[gre.score=0.85PA_1+0.7300663h_2+0.2699337u_2\] \[toefl.score=0.86PA_1+0.7373377h_2+0.2626623u_2\] \[univ.ranking=0.79PA_1+0.6244234h_2+0.3755766u_2\] \[motiv.letter.strength=0.79PA_1+0.6318265h_2+0.6318265u_2\] \[recommendation.strength=0.70PA_1+0.4919550h_2+0.5080450u_2\] \[gpa=0.93PA_1+0.8688122h_2+0.1311878u_2\] \[admission.score=0.91PA_1+0.8368171h_2+0.1631829u_2\]

loads = PFA$loadings
fa.diagram(loads)

round(PFA$loadings[1:7,],4)
##               gre_score             toefl_score            univ_ranking 
##                  0.8544                  0.8587                  0.7902 
##   motiv_letter_strength recommendation_strength                     gpa 
##                  0.7949                  0.7014                  0.9321 
##         admission_score 
##                  0.9148