Dataset link: -> https://data.mendeley.com/datasets/8fh8wtn6p5/3
LIN, Siliang (2025), “Enterprise ESG and Multivariate Characteristics Dataset”, Mendeley Data, V3, doi: 10.17632/8fh8wtn6p5.3
Dataset yang digunakan dalam penelitian ini berasal dari Enterprise ESG and Multivariate Characteristics Dataset, sebuah kumpulan data perusahaan yang dipublikasikan di Mendeley Data (Version 3). Dataset ini berisi serangkaian variabel yang menggambarkan karakteristik perusahaan dari berbagai aspek, termasuk kinerja keuangan, ukuran perusahaan, struktur modal, karakteristik pengusaha, serta indikator Environmental, Social, and Governance (ESG).
Variabel dependen utama adalah skor kinerja ESG yang diukur menggunakan Huazheng ESG Index, dengan nilai dalam rentang 0–100 yang mencerminkan performa perusahaan dalam dimensi lingkungan (environment), sosial (social), dan tata kelola (governance). Variabel independen terdiri dari karakteristik perusahaan dan pengusaha seperti umur perusahaan, ukuran perusahaan (Size), profitabilitas (ROA), pertumbuhan (Growth), kas (Cash), leverage (lev), umur eksekutif (Age_e), serta beberapa variabel pengontrol lainnya. Selain itu, dataset ini juga mencakup variabel-variabel kategori seperti status politik perusahaan dan skala perusahaan, serta variabel mediasi seperti pengeluaran hiburan bisnis (BE) dan subsidi pemerintah (GS).
Variables yang akan digunakan (Setelah proses preprocessing):
X1: Umur perusahaan (Age_c)
X2: Ukuran perusahaan (Size) (dalam satuan log)
X3: Return of Assets (ROA)
X4: Pertumbuhan Perusahaan (Growth)
X5: Kas yang dimiliki (Cash)
X6: Tingkat Utang / Leverage (lev)
X7: Umur Eksekutif / Chairman (Age_e)
X8: Gaji (GS)
X9: Board experience (BE)
X10: Environmental score High (E_H)
X11: Social score High (S_H)
X12: Governace score High (G_H)
X13: Environmental score Weighted (E_W)
X14: Social score Weighted (S_W)
X15: Governance score Weighted (G_W)
X16: Perbedaan gaji (Pay_gap)
Pertama, download data dari link dataset yang sudah disediakan. Format data masih dalam bentuk excel, sebaiknya di convert menjadi csv untuk memudahkan analalisis. Selanjutnya, import dataset ke dalam variabel data.
data <- read.csv("C:/Users/WIN 10/Downloads/data2.csv")
head(data)
## code name time soe Scale industry Age_c Size ROA Growth
## 1 6 深振业A 2013 1 1 CI420000 3.091042 23.02266 0.070024 0.498858
## 2 10 美丽生态 2013 0 0 CI230000 2.944439 20.75068 0.003708 0.925853
## 3 12 南玻A 2013 0 1 CI240000 3.091042 23.43656 0.107516 0.105719
## 4 16 深康佳A 2013 1 1 CI330000 3.091042 23.47968 0.003801 0.091007
## 5 19 深粮控股 2013 1 1 CI370000 3.091042 21.01342 0.033827 0.410336
## 6 20 深华发A 2013 0 0 CI600000 3.091042 20.41040 -0.008911 -0.134095
## Cash lev Age_e Female Edu Cpc PLC PLC_level LM LM_level
## 1 -0.153821 0.579972 57 0 0 0 0 0 0 0
## 2 0.016543 0.358152 49 0 3 0 0 0 0 0
## 3 0.247188 0.442868 69 0 0 0 0 0 0 0
## 4 0.192038 0.728132 49 0 4 0 0 0 0 0
## 5 -0.043995 0.243113 52 0 4 0 0 0 0 0
## 6 -0.161447 0.625810 51 0 4 0 1 0 0 0
## GS BE E_H S_H G_H E_W S_W G_W
## 1 14.9151066958273 #NUM! 62.03 83.51 82.80 62.03 83.51 82.8
## 2 14.2913340563696 15.0772261829979 50.95 59.44 75.60 50.95 59.44 75.6
## 3 18.561643365224 16.688119034351 53.65 72.25 84.02 53.65 72.25 84.02
## 4 18.2504088740485 17.9808099110946 65.74 71.18 81.36 65.74 71.18 81.36
## 5 15.9647212274328 #NUM! 56.03 72.90 84.28 56.03 72.9 84.28
## 6 14.3178603632492 14.4531590506944 54.30 62.45 73.56 54.3 62.45 73.56
## Pay_gap PC_level
## 1 2.95718175727184 0
## 2 2.59423819696454 0
## 3 10.32163354395 0
## 4 8.54371296132965 0
## 5 8.76162747169398 2
## 6 6.36212887892146 4
Import library esensial untuk memudahkan proses analisis
library("tidyr")
library("vctrs")
library("psych")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:vctrs':
##
## data_frame
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("factoextra")
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
Tahap pertama yang dilakukan adalah memilah atau subset variabel yang bukan numerik. Di bawah ini merupakan variabel-variabel dengan tipe data antara lain; kategorik, teks, dan deskripsi.
data <- subset(data, select = -c(name, soe, Scale, industry, Female, Edu, Cpc, PLC, PLC_level, LM, LM_level, PC_level, code, time))
head(data)
## Age_c Size ROA Growth Cash lev Age_e
## 1 3.091042 23.02266 0.070024 0.498858 -0.153821 0.579972 57
## 2 2.944439 20.75068 0.003708 0.925853 0.016543 0.358152 49
## 3 3.091042 23.43656 0.107516 0.105719 0.247188 0.442868 69
## 4 3.091042 23.47968 0.003801 0.091007 0.192038 0.728132 49
## 5 3.091042 21.01342 0.033827 0.410336 -0.043995 0.243113 52
## 6 3.091042 20.41040 -0.008911 -0.134095 -0.161447 0.625810 51
## GS BE E_H S_H G_H E_W S_W G_W
## 1 14.9151066958273 #NUM! 62.03 83.51 82.80 62.03 83.51 82.8
## 2 14.2913340563696 15.0772261829979 50.95 59.44 75.60 50.95 59.44 75.6
## 3 18.561643365224 16.688119034351 53.65 72.25 84.02 53.65 72.25 84.02
## 4 18.2504088740485 17.9808099110946 65.74 71.18 81.36 65.74 71.18 81.36
## 5 15.9647212274328 #NUM! 56.03 72.90 84.28 56.03 72.9 84.28
## 6 14.3178603632492 14.4531590506944 54.30 62.45 73.56 54.3 62.45 73.56
## Pay_gap
## 1 2.95718175727184
## 2 2.59423819696454
## 3 10.32163354395
## 4 8.54371296132965
## 5 8.76162747169398
## 6 6.36212887892146
Memeriksa apakah data memiliki nilai kosong.
sum(is.na(data))
## [1] 13
Menghapus baris yang terdeteksi memiliki nilai kosong lalu muat ke dalam variabel data_clean.
data_clean <- drop_na(data)
head(data_clean)
## Age_c Size ROA Growth Cash lev Age_e
## 1 3.091042 23.02266 0.070024 0.498858 -0.153821 0.579972 57
## 2 2.944439 20.75068 0.003708 0.925853 0.016543 0.358152 49
## 3 3.091042 23.43656 0.107516 0.105719 0.247188 0.442868 69
## 4 3.091042 23.47968 0.003801 0.091007 0.192038 0.728132 49
## 5 3.091042 21.01342 0.033827 0.410336 -0.043995 0.243113 52
## 6 3.091042 20.41040 -0.008911 -0.134095 -0.161447 0.625810 51
## GS BE E_H S_H G_H E_W S_W G_W
## 1 14.9151066958273 #NUM! 62.03 83.51 82.80 62.03 83.51 82.8
## 2 14.2913340563696 15.0772261829979 50.95 59.44 75.60 50.95 59.44 75.6
## 3 18.561643365224 16.688119034351 53.65 72.25 84.02 53.65 72.25 84.02
## 4 18.2504088740485 17.9808099110946 65.74 71.18 81.36 65.74 71.18 81.36
## 5 15.9647212274328 #NUM! 56.03 72.90 84.28 56.03 72.9 84.28
## 6 14.3178603632492 14.4531590506944 54.30 62.45 73.56 54.3 62.45 73.56
## Pay_gap
## 1 2.95718175727184
## 2 2.59423819696454
## 3 10.32163354395
## 4 8.54371296132965
## 5 8.76162747169398
## 6 6.36212887892146
Memastikan tidak ada data kosong pada variabel data_clean.
sum(is.na(data_clean))
## [1] 0
Melihat tipe data keseluruhan variabel yang akan dipakai.
str(data_clean)
## 'data.frame': 34085 obs. of 16 variables:
## $ Age_c : num 3.09 2.94 3.09 3.09 3.09 ...
## $ Size : num 23 20.8 23.4 23.5 21 ...
## $ ROA : num 0.07002 0.00371 0.10752 0.0038 0.03383 ...
## $ Growth : num 0.499 0.926 0.106 0.091 0.41 ...
## $ Cash : num -0.1538 0.0165 0.2472 0.192 -0.044 ...
## $ lev : num 0.58 0.358 0.443 0.728 0.243 ...
## $ Age_e : int 57 49 69 49 52 51 66 49 51 56 ...
## $ GS : chr "14.9151066958273" "14.2913340563696" "18.561643365224" "18.2504088740485" ...
## $ BE : chr "#NUM!" "15.0772261829979" "16.688119034351" "17.9808099110946" ...
## $ E_H : num 62 51 53.6 65.7 56 ...
## $ S_H : num 83.5 59.4 72.2 71.2 72.9 ...
## $ G_H : num 82.8 75.6 84 81.4 84.3 ...
## $ E_W : chr "62.03" "50.95" "53.65" "65.74" ...
## $ S_W : chr "83.51" "59.44" "72.25" "71.18" ...
## $ G_W : chr "82.8" "75.6" "84.02" "81.36" ...
## $ Pay_gap: chr "2.95718175727184" "2.59423819696454" "10.32163354395" "8.54371296132965" ...
Setelah ditelusuri, masih ada tipe data yang bukan numerik. Hal tersebut menyebabkan operasi matematika tidak dapat dijalankan pada tipe data yang bukan numerik. Solusinya adalah mengganti tipe data menjadi numeric dengan menggunakan sintax as.numeric terhadap masing-masing variabel.
class(data_clean)
## [1] "data.frame"
data_clean$Age_e <- as.numeric(data_clean$Age_e)
data_clean$BE <- as.numeric(data_clean$BE)
## Warning: NAs introduced by coercion
data_clean$GS <- as.numeric(data_clean$GS)
## Warning: NAs introduced by coercion
data_clean$E_W <- as.numeric(data_clean$E_W)
## Warning: NAs introduced by coercion
data_clean$S_W <- as.numeric(data_clean$S_W)
## Warning: NAs introduced by coercion
data_clean$G_W <- as.numeric(data_clean$G_W)
## Warning: NAs introduced by coercion
data_clean$Pay_gap <- as.numeric(data_clean$Pay_gap)
## Warning: NAs introduced by coercion
str(data_clean)
## 'data.frame': 34085 obs. of 16 variables:
## $ Age_c : num 3.09 2.94 3.09 3.09 3.09 ...
## $ Size : num 23 20.8 23.4 23.5 21 ...
## $ ROA : num 0.07002 0.00371 0.10752 0.0038 0.03383 ...
## $ Growth : num 0.499 0.926 0.106 0.091 0.41 ...
## $ Cash : num -0.1538 0.0165 0.2472 0.192 -0.044 ...
## $ lev : num 0.58 0.358 0.443 0.728 0.243 ...
## $ Age_e : num 57 49 69 49 52 51 66 49 51 56 ...
## $ GS : num 14.9 14.3 18.6 18.3 16 ...
## $ BE : num NA 15.1 16.7 18 NA ...
## $ E_H : num 62 51 53.6 65.7 56 ...
## $ S_H : num 83.5 59.4 72.2 71.2 72.9 ...
## $ G_H : num 82.8 75.6 84 81.4 84.3 ...
## $ E_W : num 62 51 53.6 65.7 56 ...
## $ S_W : num 83.5 59.4 72.2 71.2 72.9 ...
## $ G_W : num 82.8 75.6 84 81.4 84.3 ...
## $ Pay_gap: num 2.96 2.59 10.32 8.54 8.76 ...
Melihat apakah data sudah bersih dan tidak mengandung nilai kosong.
sum(is.na(data_clean))
## [1] 9041
Drop baris data yang memiliki nilai kosong.
data_final <- drop_na(data_clean)
head(data_final)
## Age_c Size ROA Growth Cash lev Age_e GS
## 1 2.944439 20.75068 0.003708 0.925853 0.016543 0.358152 49 14.29133
## 2 3.091042 23.43656 0.107516 0.105719 0.247188 0.442868 69 18.56164
## 3 3.091042 23.47968 0.003801 0.091007 0.192038 0.728132 49 18.25041
## 4 3.091042 20.41040 -0.008911 -0.134095 -0.161447 0.625810 51 14.31786
## 5 2.995732 23.34260 0.011574 -0.082928 0.060694 0.638573 66 15.55302
## 6 3.044522 21.99266 0.036565 0.026301 0.040563 0.567590 49 15.09938
## BE E_H S_H G_H E_W S_W G_W Pay_gap
## 1 15.07723 50.95 59.44 75.60 50.95 59.44 75.60 2.594238
## 2 16.68812 53.65 72.25 84.02 53.65 72.25 84.02 10.321634
## 3 17.98081 65.74 71.18 81.36 65.74 71.18 81.36 8.543713
## 4 14.45316 54.30 62.45 73.56 54.30 62.45 73.56 6.362129
## 5 12.82637 67.13 71.01 87.60 67.13 71.01 87.60 31.215642
## 6 15.07158 47.67 100.00 81.77 47.67 100.00 81.77 21.533502
Memastikan data sudah bersih dan siap untuk dianalisis hubungan antar variabel untuk eksplorasi lebih mendalam.
library(ggplot2)
data_long <- data_final %>%
pivot_longer(cols = everything(),
names_to = "Variable",
values_to = "Value")
ggplot(data_long, aes(x = Variable, y = Value)) +
geom_boxplot(fill = "skyblue", color = "black") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Boxplot Masing-Masing Variabel",
x = "Variabel",
y = "Nilai")
sum(is.na(data_final))
## [1] 0
Berdasarkan matriks korelasi tersebut, sebagian besar variabel menunjukkan hubungan yang lemah hingga sedang, dengan beberapa korelasi yang menonjol. Size memiliki korelasi positif cukup kuat dengan GS (0,58) dan BE (0,46) serta hubungan sedang dengan Age_c (0,41) dan lev (0,33), yang mengindikasikan perusahaan lebih besar cenderung memiliki struktur tata kelola dan leverage yang lebih tinggi. ROA berkorelasi negatif kuat dengan lev (-0,67), menunjukkan semakin tinggi profitabilitas maka leverage cenderung lebih rendah. Cash berkorelasi negatif sedang dengan lev (-0,28) dan positif lemah dengan G_H/G_W (~0,17).
cor(data_final)
## Age_c Size ROA Growth Cash
## Age_c 1.000000000 0.40838865 -0.068909317 0.006435435 -0.161688386
## Size 0.408388648 1.00000000 0.023885233 0.028611129 -0.114984694
## ROA -0.068909317 0.02388523 1.000000000 0.006842594 0.132265029
## Growth 0.006435435 0.02861113 0.006842594 1.000000000 -0.004957106
## Cash -0.161688386 -0.11498469 0.132265029 -0.004957106 1.000000000
## lev 0.264920654 0.32934321 -0.671519766 0.010860171 -0.284062506
## Age_e -0.019264160 0.05201209 0.039468589 0.006919254 0.056729079
## GS 0.138305671 0.58136090 0.045430626 0.019638441 -0.032679944
## BE 0.090059615 0.46336339 0.018015299 0.019677485 -0.086729451
## E_H 0.039462578 0.23342217 0.006749630 -0.017857572 -0.025681232
## S_H -0.056414679 0.15521577 0.040020280 -0.009792459 0.003876494
## G_H -0.213188191 -0.01680149 0.110714814 -0.007397036 0.173059933
## E_W 0.039459707 0.23342267 0.006748600 -0.017858784 -0.025683089
## S_W -0.056431522 0.15520028 0.040021881 -0.009791452 0.003881078
## G_W -0.213191945 -0.01680207 0.110715767 -0.007395829 0.173059893
## Pay_gap 0.020112100 0.24980293 0.046282029 0.017435396 0.057440189
## lev Age_e GS BE E_H
## Age_c 0.26492065 -0.019264160 0.13830567 0.09005962 0.03946258
## Size 0.32934321 0.052012090 0.58136090 0.46336339 0.23342217
## ROA -0.67151977 0.039468589 0.04543063 0.01801530 0.00674963
## Growth 0.01086017 0.006919254 0.01963844 0.01967748 -0.01785757
## Cash -0.28406251 0.056729079 -0.03267994 -0.08672945 -0.02568123
## lev 1.00000000 -0.062260409 0.14551526 0.15077441 0.06644032
## Age_e -0.06226041 1.000000000 0.06834644 0.02408725 0.04202461
## GS 0.14551526 0.068346444 1.00000000 0.41863543 0.20987914
## BE 0.15077441 0.024087248 0.41863543 1.00000000 0.16874441
## E_H 0.06644032 0.042024614 0.20987914 0.16874441 1.00000000
## S_H 0.01707006 0.044419547 0.13455239 0.20972178 0.28321614
## G_H -0.25621152 0.102644092 0.03676786 -0.02012171 0.09358949
## E_W 0.06644251 0.042025021 0.20988121 0.16874142 0.99999997
## S_W 0.01706382 0.044401217 0.13454875 0.20971545 0.28321588
## G_W -0.25621033 0.102644157 0.03676752 -0.02011834 0.09359115
## Pay_gap 0.03373660 0.056181243 0.26157453 0.23051569 0.10123854
## S_H G_H E_W S_W G_W
## Age_c -0.056414679 -0.213188191 0.03945971 -0.056431522 -0.213191945
## Size 0.155215773 -0.016801493 0.23342267 0.155200279 -0.016802069
## ROA 0.040020280 0.110714814 0.00674860 0.040021881 0.110715767
## Growth -0.009792459 -0.007397036 -0.01785878 -0.009791452 -0.007395829
## Cash 0.003876494 0.173059933 -0.02568309 0.003881078 0.173059893
## lev 0.017070059 -0.256211516 0.06644251 0.017063819 -0.256210325
## Age_e 0.044419547 0.102644092 0.04202502 0.044401217 0.102644157
## GS 0.134552390 0.036767856 0.20988121 0.134548747 0.036767523
## BE 0.209721778 -0.020121710 0.16874142 0.209715445 -0.020118339
## E_H 0.283216138 0.093589491 0.99999997 0.283215879 0.093591155
## S_H 1.000000000 0.061237673 0.28321547 0.999999104 0.061238919
## G_H 0.061237673 1.000000000 0.09358930 0.061232729 0.999999962
## E_W 0.283215468 0.093589297 1.00000000 0.283215085 0.093590953
## S_W 0.999999104 0.061232729 0.28321508 1.000000000 0.061234085
## G_W 0.061238919 0.999999962 0.09359095 0.061234085 1.000000000
## Pay_gap 0.086218179 0.032582412 0.10123682 0.086210940 0.032582748
## Pay_gap
## Age_c 0.02011210
## Size 0.24980293
## ROA 0.04628203
## Growth 0.01743540
## Cash 0.05744019
## lev 0.03373660
## Age_e 0.05618124
## GS 0.26157453
## BE 0.23051569
## E_H 0.10123854
## S_H 0.08621818
## G_H 0.03258241
## E_W 0.10123682
## S_W 0.08621094
## G_W 0.03258275
## Pay_gap 1.00000000
corrplot::corrplot(cor(data_final), tl.col = "black", tl.srt = 45, tl.cex = 0.5)
r <- cor(data_final)
KMO(r)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.55
## MSA for each item =
## Age_c Size ROA Growth Cash lev Age_e GS BE E_H
## 0.70 0.65 0.46 0.80 0.78 0.57 0.87 0.76 0.84 0.52
## S_H G_H E_W S_W G_W Pay_gap
## 0.48 0.51 0.52 0.48 0.51 0.85
data_final = data_final[-3]
r <- cor(data_final)
KMO(r)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.56
## MSA for each item =
## Age_c Size Growth Cash lev Age_e GS BE E_H S_H
## 0.68 0.69 0.80 0.76 0.81 0.87 0.75 0.83 0.52 0.48
## G_H E_W S_W G_W Pay_gap
## 0.50 0.52 0.48 0.50 0.84
data_final = data_final[-10]
r <- cor(data_final)
KMO(r)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.62
## MSA for each item =
## Age_c Size Growth Cash lev Age_e GS BE E_H G_H
## 0.67 0.68 0.78 0.76 0.81 0.87 0.75 0.80 0.55 0.54
## E_W S_W G_W Pay_gap
## 0.55 0.90 0.54 0.84
terdapat korelasi yang signifikan antar variabel dalam dataset dan variabel-variabel tersebut tidak saling independen. Dengan demikian, data memenuhi syarat untuk dilakukan analisis faktor karena terdapat hubungan yang cukup antar variabel untuk membentuk struktur faktor.
bartlett.test(data_final)
##
## Bartlett test of homogeneity of variances
##
## data: data_final
## Bartlett's K-squared = 568480, df = 13, p-value < 2.2e-16
scale_data = scale(data_final)
r = cov(scale_data)
pc <- eigen(r)
print("eigen values:")
## [1] "eigen values:"
pc$values
## [1] 2.879130e+00 2.429830e+00 1.551724e+00 1.138592e+00 9.964939e-01
## [6] 9.754586e-01 9.236111e-01 8.156676e-01 7.375812e-01 6.719756e-01
## [11] 5.606230e-01 3.193134e-01 4.007981e-08 3.191019e-08
pc$vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.19770487 0.275408810 -0.12238091 0.34463483 0.009626236 0.31205766
## [2,] -0.44452618 0.093932338 -0.31363985 0.10228335 -0.015947629 0.05865615
## [3,] -0.01013878 0.017457786 -0.07639563 -0.05144541 0.987368083 -0.11938648
## [4,] 0.10602723 -0.238081069 -0.05298034 -0.48027298 -0.029832018 0.12703797
## [5,] -0.21793223 0.308324728 -0.03393120 0.33779450 0.008345121 -0.03332022
## [6,] -0.04660174 -0.122361879 -0.10681829 -0.17842495 0.096633462 0.85399941
## [7,] -0.38743554 -0.006832145 -0.32197984 -0.11394850 -0.038390240 -0.02476825
## [8,] -0.35359594 0.030311315 -0.26405375 -0.18352551 -0.058028192 -0.23703837
## [9,] -0.40442004 -0.210250139 0.49475151 0.04168416 0.051599907 0.07146023
## [10,] 0.01719168 -0.565362598 -0.24004662 0.32887609 0.001512841 -0.05016031
## [11,] -0.40441988 -0.210249890 0.49475203 0.04168570 0.051598989 0.07146055
## [12,] -0.23140374 -0.132274453 0.15179974 -0.23490952 -0.051110864 -0.24892487
## [13,] 0.01719096 -0.565363210 -0.24004567 0.32887441 0.001513932 -0.05016262
## [14,] -0.21427165 -0.046478145 -0.25339057 -0.41215788 -0.040947186 -0.06597493
## [,7] [,8] [,9] [,10] [,11]
## [1,] -0.342441302 0.32366701 0.371036288 -0.390547496 -0.1652412359
## [2,] -0.103747199 0.17116794 -0.009452066 0.017943000 0.1078232314
## [3,] -0.012463590 0.03503305 0.020158123 -0.006203705 0.0005314062
## [4,] -0.497824434 0.44104166 0.098546403 0.455738248 -0.1443918950
## [5,] 0.138512935 -0.14995123 0.244294350 0.777374360 -0.0901763478
## [6,] 0.432474966 -0.08003637 -0.035549105 0.053389794 -0.0570620427
## [7,] -0.067410196 0.06164887 -0.368785438 0.047713698 0.6248593047
## [8,] 0.209166732 0.08714736 -0.355972006 -0.077883156 -0.7071630937
## [9,] -0.146062472 -0.09257095 -0.077574402 0.001359642 -0.0439698469
## [10,] 0.009377776 -0.01961270 0.080511654 0.027987649 -0.0446392239
## [11,] -0.146060850 -0.09257270 -0.077576751 0.001363529 -0.0439628700
## [12,] 0.535882754 0.43739810 0.519446767 -0.077987514 0.1721532518
## [13,] 0.009380424 -0.01961409 0.080509557 0.027990531 -0.0446428994
## [14,] -0.198636353 -0.64860169 0.483361099 -0.127509130 -0.0285494056
## [,12] [,13] [,14]
## [1,] -0.338102869 -6.757467e-07 -4.486963e-06
## [2,] 0.793062036 -1.422041e-06 9.758348e-07
## [3,] -0.011526995 1.192603e-06 -2.848740e-07
## [4,] -0.031197885 8.153811e-07 -1.043072e-06
## [5,] -0.164958998 3.090438e-07 2.229839e-06
## [6,] -0.012104250 -1.155852e-07 3.565571e-07
## [7,] -0.440489842 -2.370443e-06 1.661480e-06
## [8,] -0.163962797 4.477329e-06 -1.445091e-06
## [9,] -0.009848756 -3.752622e-01 5.993148e-01
## [10,] -0.045154072 5.993145e-01 3.752625e-01
## [11,] -0.009846385 3.752632e-01 -5.993142e-01
## [12,] -0.046076663 3.615398e-07 -3.677206e-07
## [13,] -0.045153787 -5.993146e-01 -3.752629e-01
## [14,] -0.031064947 6.050581e-07 -1.198455e-06
sumvar <- sum(pc$values)
propvar <- sapply(pc$values, function(x) x/sumvar)*100
cumvar <- data.frame(
PC = paste0("PC", 1:length(pc$values)),
eigen_value = pc$values,
propvar = propvar
) %>%
mutate(cum = cumsum(propvar))
print(cumvar)
## PC eigen_value propvar cum
## 1 PC1 2.879130e+00 2.056521e+01 20.56521
## 2 PC2 2.429830e+00 1.735593e+01 37.92114
## 3 PC3 1.551724e+00 1.108374e+01 49.00488
## 4 PC4 1.138592e+00 8.132803e+00 57.13768
## 5 PC5 9.964939e-01 7.117814e+00 64.25550
## 6 PC6 9.754586e-01 6.967561e+00 71.22306
## 7 PC7 9.236111e-01 6.597222e+00 77.82028
## 8 PC8 8.156676e-01 5.826197e+00 83.64648
## 9 PC9 7.375812e-01 5.268437e+00 88.91491
## 10 PC10 6.719756e-01 4.799826e+00 93.71474
## 11 PC11 5.606230e-01 4.004450e+00 97.71919
## 12 PC12 3.193134e-01 2.280810e+00 100.00000
## 13 PC13 4.007981e-08 2.862844e-07 100.00000
## 14 PC14 3.191019e-08 2.279299e-07 100.00000
scores <- as.matrix(scale_data) %*% pc$vectors
scores_PC <- scores[,1:3]
head(scores_PC)
## [,1] [,2] [,3]
## [1,] 2.6246202 1.7005287 -0.4500386
## [2,] -0.9395970 -0.1955997 -3.0455795
## [3,] -2.6198130 0.2611240 -1.0898242
## [4,] 2.0182651 2.1307278 0.2097418
## [5,] -1.4298265 -1.4612969 -0.8075467
## [6,] 0.4518249 0.4483597 -1.9425800
pca_result <- prcomp(data_final, scale. = TRUE)
# Scree Plot
fviz_eig(pca_result,
addlabels = TRUE,
ncp = ncol(data_final),
barfill = "skyblue",
barcolor = "darkblue",
linecolor = "red")
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.
fviz_pca_biplot(pca_result,
label = "var",
col.var = "red",
col.ind = "gray",
alpha.ind = 0.5) +
ggtitle("PCA Biplot")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
## Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
fviz_contrib(pca_result,
choice = "var",
axes = 1,
top = 5) +
ggtitle("Kontribusi Variabel ke PC1")
fviz_contrib(pca_result,
choice = "var",
axes = 2,
top = 5) +
ggtitle("Kontribusi Variabel ke PC2")
fviz_contrib(pca_result,
choice = "var",
axes = 3,
top = 5) +
ggtitle("Kontribusi Variabel ke PC3")
fviz_contrib(pca_result,
choice = "var",
axes = 4,
top = 5) +
ggtitle("Kontribusi Variabel ke PC4")
covvar = cov(scale_data)
PC = eigen(covvar)
cat("eigen vals:")
## eigen vals:
PC$values
## [1] 2.879130e+00 2.429830e+00 1.551724e+00 1.138592e+00 9.964939e-01
## [6] 9.754586e-01 9.236111e-01 8.156676e-01 7.375812e-01 6.719756e-01
## [11] 5.606230e-01 3.193134e-01 4.007981e-08 3.191019e-08
PC$vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.19770487 0.275408810 -0.12238091 0.34463483 0.009626236 0.31205766
## [2,] -0.44452618 0.093932338 -0.31363985 0.10228335 -0.015947629 0.05865615
## [3,] -0.01013878 0.017457786 -0.07639563 -0.05144541 0.987368083 -0.11938648
## [4,] 0.10602723 -0.238081069 -0.05298034 -0.48027298 -0.029832018 0.12703797
## [5,] -0.21793223 0.308324728 -0.03393120 0.33779450 0.008345121 -0.03332022
## [6,] -0.04660174 -0.122361879 -0.10681829 -0.17842495 0.096633462 0.85399941
## [7,] -0.38743554 -0.006832145 -0.32197984 -0.11394850 -0.038390240 -0.02476825
## [8,] -0.35359594 0.030311315 -0.26405375 -0.18352551 -0.058028192 -0.23703837
## [9,] -0.40442004 -0.210250139 0.49475151 0.04168416 0.051599907 0.07146023
## [10,] 0.01719168 -0.565362598 -0.24004662 0.32887609 0.001512841 -0.05016031
## [11,] -0.40441988 -0.210249890 0.49475203 0.04168570 0.051598989 0.07146055
## [12,] -0.23140374 -0.132274453 0.15179974 -0.23490952 -0.051110864 -0.24892487
## [13,] 0.01719096 -0.565363210 -0.24004567 0.32887441 0.001513932 -0.05016262
## [14,] -0.21427165 -0.046478145 -0.25339057 -0.41215788 -0.040947186 -0.06597493
## [,7] [,8] [,9] [,10] [,11]
## [1,] -0.342441302 0.32366701 0.371036288 -0.390547496 -0.1652412359
## [2,] -0.103747199 0.17116794 -0.009452066 0.017943000 0.1078232314
## [3,] -0.012463590 0.03503305 0.020158123 -0.006203705 0.0005314062
## [4,] -0.497824434 0.44104166 0.098546403 0.455738248 -0.1443918950
## [5,] 0.138512935 -0.14995123 0.244294350 0.777374360 -0.0901763478
## [6,] 0.432474966 -0.08003637 -0.035549105 0.053389794 -0.0570620427
## [7,] -0.067410196 0.06164887 -0.368785438 0.047713698 0.6248593047
## [8,] 0.209166732 0.08714736 -0.355972006 -0.077883156 -0.7071630937
## [9,] -0.146062472 -0.09257095 -0.077574402 0.001359642 -0.0439698469
## [10,] 0.009377776 -0.01961270 0.080511654 0.027987649 -0.0446392239
## [11,] -0.146060850 -0.09257270 -0.077576751 0.001363529 -0.0439628700
## [12,] 0.535882754 0.43739810 0.519446767 -0.077987514 0.1721532518
## [13,] 0.009380424 -0.01961409 0.080509557 0.027990531 -0.0446428994
## [14,] -0.198636353 -0.64860169 0.483361099 -0.127509130 -0.0285494056
## [,12] [,13] [,14]
## [1,] -0.338102869 -6.757467e-07 -4.486963e-06
## [2,] 0.793062036 -1.422041e-06 9.758348e-07
## [3,] -0.011526995 1.192603e-06 -2.848740e-07
## [4,] -0.031197885 8.153811e-07 -1.043072e-06
## [5,] -0.164958998 3.090438e-07 2.229839e-06
## [6,] -0.012104250 -1.155852e-07 3.565571e-07
## [7,] -0.440489842 -2.370443e-06 1.661480e-06
## [8,] -0.163962797 4.477329e-06 -1.445091e-06
## [9,] -0.009848756 -3.752622e-01 5.993148e-01
## [10,] -0.045154072 5.993145e-01 3.752625e-01
## [11,] -0.009846385 3.752632e-01 -5.993142e-01
## [12,] -0.046076663 3.615398e-07 -3.677206e-07
## [13,] -0.045153787 -5.993146e-01 -3.752629e-01
## [14,] -0.031064947 6.050581e-07 -1.198455e-06
# Jumlah komponen yang dipilih (Kaiser = 4)
k <- 4
sp <- sum(pc$values[1:k])
Laten1 <- sqrt(pc$values[1]) * pc$vectors[,1]
Laten2 <- sqrt(pc$values[2]) * pc$vectors[,2]
Laten3 <- sqrt(pc$values[3]) * pc$vectors[,3]
Laten4 <- sqrt(pc$values[4]) * pc$vectors[,4]
Laten_final <- cbind(Laten1, Laten2, Laten3, Laten4)
cat("Factor loading:\n")
## Factor loading:
print(Laten_final)
## Laten1 Laten2 Laten3 Laten4
## [1,] -0.33546559 0.42930479 -0.15244771 0.36774206
## [2,] -0.75427195 0.14642089 -0.39069556 0.10914129
## [3,] -0.01720347 0.02721304 -0.09516467 -0.05489474
## [4,] 0.17990699 -0.37111864 -0.06599666 -0.51247454
## [5,] -0.36978737 0.48061383 -0.04226749 0.36044310
## [6,] -0.07907383 -0.19073660 -0.13306164 -0.19038806
## [7,] -0.65740056 -0.01064989 -0.40108453 -0.12158858
## [8,] -0.59998154 0.04724901 -0.32892703 -0.19583061
## [9,] -0.68621986 -0.32773604 0.61630311 0.04447902
## [10,] 0.02917084 -0.88128216 -0.29902179 0.35092672
## [11,] -0.68621958 -0.32773565 0.61630376 0.04448067
## [12,] -0.39264583 -0.20618823 0.18909423 -0.25065984
## [13,] 0.02916962 -0.88128312 -0.29902060 0.35092494
## [14,] -0.36357609 -0.07244972 -0.31564411 -0.43979243
R <- cor(data_final, use = "complete.obs")
eig <- eigen(R)
eigen_values <- eig$values
plot(eigen_values,
type = "b",
pch = 19,
xlab = "Factor",
ylab = "Eigenvalue",
main = "Scree Plot (Kaiser Rule)")
abline(h = 1, col = "red", lty = 2, lwd = 2)
fa.parallel(data_final,
fa = "fa",
n.iter = 100,
show.legend = TRUE,
main = "Parallel Analysis")
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## In smc, smcs < 0 were set to .0
## Parallel analysis suggests that the number of factors = 5 and the number of components = NA
loadings_df <- as.data.frame(Laten_final)
rownames(loadings_df) <- colnames(data_final)
ggplot(loadings_df, aes(x = reorder(rownames(loadings_df), Laten1), y = Laten1)) +
geom_bar(stat = "identity") +
coord_flip() +
theme_minimal() +
labs(title = "Factor Loading - Faktor 1",
x = "Variabel",
y = "Loading")
loadings_df <- as.data.frame(Laten_final)
rownames(loadings_df) <- colnames(data_final)
ggplot(loadings_df, aes(x = reorder(rownames(loadings_df), Laten2), y = Laten2)) +
geom_bar(stat = "identity") +
coord_flip() +
theme_minimal() +
labs(title = "Factor Loading - Faktor 2",
x = "Variabel",
y = "Loading")
loadings_df <- as.data.frame(Laten_final)
rownames(loadings_df) <- colnames(data_final)
ggplot(loadings_df, aes(x = reorder(rownames(loadings_df), Laten3), y = Laten3)) +
geom_bar(stat = "identity") +
coord_flip() +
theme_minimal() +
labs(title = "Factor Loading - Faktor 3",
x = "Variabel",
y = "Loading")
loadings_df <- as.data.frame(Laten_final)
rownames(loadings_df) <- colnames(data_final)
ggplot(loadings_df, aes(x = reorder(rownames(loadings_df), Laten4), y = Laten4)) +
geom_bar(stat = "identity") +
coord_flip() +
theme_minimal() +
labs(title = "Factor Loading - Faktor 4",
x = "Variabel",
y = "Loading")