Implementasi Principal Component Analysis (PCA) dan Factor Analysis (FA) pada Analisis Perilaku Pelanggan Retail

Dataset yang digunakan berasal dari Kaggle dengan judul Customer Personality Analysis Dataset. Dataset ini berisi data perilaku pelanggan perusahaan ritel yang digunakan untuk menganalisis karakteristik konsumen berdasarkan pendapatan, pengeluaran produk, dan aktivitas pembelian. Data terdiri dari 2.240 pelanggan, dengan setiap baris dalam data mewakili satu individu.

Import Data

library(psych)
library(knitr)

data <- read.delim("marketing_campaign.csv",
                   sep = "\t",
                   stringsAsFactors = FALSE)

str(data)

## 'data.frame':    2240 obs. of  29 variables:
##  $ ID                 : int  5524 2174 4141 6182 5324 7446 965 6177 4855 5899 ...
##  $ Year_Birth         : int  1957 1954 1965 1984 1981 1967 1971 1985 1974 1950 ...
##  $ Education          : chr  "Graduation" "Graduation" "Graduation" "Graduation" ...
##  $ Marital_Status     : chr  "Single" "Single" "Together" "Together" ...
##  $ Income             : int  58138 46344 71613 26646 58293 62513 55635 33454 30351 5648 ...
##  $ Kidhome            : int  0 1 0 1 1 0 0 1 1 1 ...
##  $ Teenhome           : int  0 1 0 0 0 1 1 0 0 1 ...
##  $ Dt_Customer        : chr  "04-09-2012" "08-03-2014" "21-08-2013" "10-02-2014" ...
##  $ Recency            : int  58 38 26 26 94 16 34 32 19 68 ...
##  $ MntWines           : int  635 11 426 11 173 520 235 76 14 28 ...
##  $ MntFruits          : int  88 1 49 4 43 42 65 10 0 0 ...
##  $ MntMeatProducts    : int  546 6 127 20 118 98 164 56 24 6 ...
##  $ MntFishProducts    : int  172 2 111 10 46 0 50 3 3 1 ...
##  $ MntSweetProducts   : int  88 1 21 3 27 42 49 1 3 1 ...
##  $ MntGoldProds       : int  88 6 42 5 15 14 27 23 2 13 ...
##  $ NumDealsPurchases  : int  3 2 1 2 5 2 4 2 1 1 ...
##  $ NumWebPurchases    : int  8 1 8 2 5 6 7 4 3 1 ...
##  $ NumCatalogPurchases: int  10 1 2 0 3 4 3 0 0 0 ...
##  $ NumStorePurchases  : int  4 2 10 4 6 10 7 4 2 0 ...
##  $ NumWebVisitsMonth  : int  7 5 4 6 5 6 6 8 9 20 ...
##  $ AcceptedCmp3       : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ AcceptedCmp4       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp5       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp2       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Complain           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Z_CostContact      : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ Z_Revenue          : int  11 11 11 11 11 11 11 11 11 11 ...
##  $ Response           : int  1 0 0 0 0 0 0 0 1 0 ...

head(data)

##     ID Year_Birth  Education Marital_Status Income Kidhome Teenhome Dt_Customer
## 1 5524       1957 Graduation         Single  58138       0        0  04-09-2012
## 2 2174       1954 Graduation         Single  46344       1        1  08-03-2014
## 3 4141       1965 Graduation       Together  71613       0        0  21-08-2013
## 4 6182       1984 Graduation       Together  26646       1        0  10-02-2014
## 5 5324       1981        PhD        Married  58293       1        0  19-01-2014
## 6 7446       1967     Master       Together  62513       0        1  09-09-2013
##   Recency MntWines MntFruits MntMeatProducts MntFishProducts MntSweetProducts
## 1      58      635        88             546             172               88
## 2      38       11         1               6               2                1
## 3      26      426        49             127             111               21
## 4      26       11         4              20              10                3
## 5      94      173        43             118              46               27
## 6      16      520        42              98               0               42
##   MntGoldProds NumDealsPurchases NumWebPurchases NumCatalogPurchases
## 1           88                 3               8                  10
## 2            6                 2               1                   1
## 3           42                 1               8                   2
## 4            5                 2               2                   0
## 5           15                 5               5                   3
## 6           14                 2               6                   4
##   NumStorePurchases NumWebVisitsMonth AcceptedCmp3 AcceptedCmp4 AcceptedCmp5
## 1                 4                 7            0            0            0
## 2                 2                 5            0            0            0
## 3                10                 4            0            0            0
## 4                 4                 6            0            0            0
## 5                 6                 5            0            0            0
## 6                10                 6            0            0            0
##   AcceptedCmp1 AcceptedCmp2 Complain Z_CostContact Z_Revenue Response
## 1            0            0        0             3        11        1
## 2            0            0        0             3        11        0
## 3            0            0        0             3        11        0
## 4            0            0        0             3        11        0
## 5            0            0        0             3        11        0
## 6            0            0        0             3        11        0

Preprocessing Data

Seleksi Variabel Numerik

Memilih variabel kuantitatif yang akan dianalisis. Variabel kategorikal dan identifier tidak disertakan karena tidak sesuai dengan asumsi PCA dan Factor Analysis.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data_numerik <- data %>%
  select(
    Income, Recency,
    MntWines, MntFruits, MntMeatProducts,
    MntFishProducts, MntSweetProducts,
    MntGoldProds,
    NumDealsPurchases,
    NumWebPurchases,
    NumCatalogPurchases,
    NumStorePurchases,
    NumWebVisitsMonth
  )

head(data_numerik)

##   Income Recency MntWines MntFruits MntMeatProducts MntFishProducts
## 1  58138      58      635        88             546             172
## 2  46344      38       11         1               6               2
## 3  71613      26      426        49             127             111
## 4  26646      26       11         4              20              10
## 5  58293      94      173        43             118              46
## 6  62513      16      520        42              98               0
##   MntSweetProducts MntGoldProds NumDealsPurchases NumWebPurchases
## 1               88           88                 3               8
## 2                1            6                 2               1
## 3               21           42                 1               8
## 4                3            5                 2               2
## 5               27           15                 5               5
## 6               42           14                 2               6
##   NumCatalogPurchases NumStorePurchases NumWebVisitsMonth
## 1                  10                 4                 7
## 2                   1                 2                 5
## 3                   2                10                 4
## 4                   0                 4                 6
## 5                   3                 6                 5
## 6                   4                10                 6

Cek tipe Data

Tujuannya untuk memastikan seluruh variabel bertipe numerik.

str(data_numerik)

## 'data.frame':    2240 obs. of  13 variables:
##  $ Income             : int  58138 46344 71613 26646 58293 62513 55635 33454 30351 5648 ...
##  $ Recency            : int  58 38 26 26 94 16 34 32 19 68 ...
##  $ MntWines           : int  635 11 426 11 173 520 235 76 14 28 ...
##  $ MntFruits          : int  88 1 49 4 43 42 65 10 0 0 ...
##  $ MntMeatProducts    : int  546 6 127 20 118 98 164 56 24 6 ...
##  $ MntFishProducts    : int  172 2 111 10 46 0 50 3 3 1 ...
##  $ MntSweetProducts   : int  88 1 21 3 27 42 49 1 3 1 ...
##  $ MntGoldProds       : int  88 6 42 5 15 14 27 23 2 13 ...
##  $ NumDealsPurchases  : int  3 2 1 2 5 2 4 2 1 1 ...
##  $ NumWebPurchases    : int  8 1 8 2 5 6 7 4 3 1 ...
##  $ NumCatalogPurchases: int  10 1 2 0 3 4 3 0 0 0 ...
##  $ NumStorePurchases  : int  4 2 10 4 6 10 7 4 2 0 ...
##  $ NumWebVisitsMonth  : int  7 5 4 6 5 6 6 8 9 20 ...

Cek Missing Value

Melihat apakah ada data kosong.

colSums(is.na(data_numerik))

##              Income             Recency            MntWines           MntFruits 
##                  24                   0                   0                   0 
##     MntMeatProducts     MntFishProducts    MntSweetProducts        MntGoldProds 
##                   0                   0                   0                   0 
##   NumDealsPurchases     NumWebPurchases NumCatalogPurchases   NumStorePurchases 
##                   0                   0                   0                   0 
##   NumWebVisitsMonth 
##                   0

Hasil menunjukkan terdapat misiing value pada variabel Income yang berjumlah 24.

Penanganan Missing Value

Missing value dihapus menggunakan metode listwise deletion agar tidak mengganggu perhitungan matriks korelasi pada PCA dan FA.

data_numerik <- na.omit(data_numerik)

dim(data_numerik)

## [1] 2216   13

Cek Missing Value Apakah Sudah Tertangani

Verifikasi bahwa semua missing sudah ditangani dengan mengecek ulang

colSums(is.na(data_numerik))

##              Income             Recency            MntWines           MntFruits 
##                   0                   0                   0                   0 
##     MntMeatProducts     MntFishProducts    MntSweetProducts        MntGoldProds 
##                   0                   0                   0                   0 
##   NumDealsPurchases     NumWebPurchases NumCatalogPurchases   NumStorePurchases 
##                   0                   0                   0                   0 
##   NumWebVisitsMonth 
##                   0

sum(is.na(data_numerik))

## [1] 0

Deteksi Outlier

Boxplot digunakan untuk mendeteksi nilai ekstrem yang berpotensi memengaruhi varians data.

boxplot(data_numerik,
        main = "Boxplot Deteksi Outlier",
        las = 2)

Menghitung Jumlah Outlier Tiap Variabel (Metode IQR)

hitung_outlier <- function(x){
  Q1 <- quantile(x, 0.25)
  Q3 <- quantile(x, 0.75)
  IQR_value <- IQR(x)
  
  lower_bound <- Q1 - 1.5 * IQR_value
  upper_bound <- Q3 + 1.5 * IQR_value
  
  sum(x < lower_bound | x > upper_bound)
}

outlier_count <- sapply(data_numerik, hitung_outlier)

outlier_count

##              Income             Recency            MntWines           MntFruits 
##                   8                   0                  35                 246 
##     MntMeatProducts     MntFishProducts    MntSweetProducts        MntGoldProds 
##                 174                 222                 246                 205 
##   NumDealsPurchases     NumWebPurchases NumCatalogPurchases   NumStorePurchases 
##                  84                   3                  23                   0 
##   NumWebVisitsMonth 
##                   8

Persentase Outlier

Untuk melihat proporsi outlier relatif terhadap jumlah data dan menghitung persentasenya

n_data <- nrow(data_numerik)

outlier_percentage <- (outlier_count / n_data) * 100

outlier_summary <- data.frame(
  Variabel = names(outlier_count),
  Jumlah_Outlier = as.numeric(outlier_count),
  Persentase_Outlier = round(outlier_percentage, 2)
)

outlier_summary

##                                Variabel Jumlah_Outlier Persentase_Outlier
## Income                           Income              8               0.36
## Recency                         Recency              0               0.00
## MntWines                       MntWines             35               1.58
## MntFruits                     MntFruits            246              11.10
## MntMeatProducts         MntMeatProducts            174               7.85
## MntFishProducts         MntFishProducts            222              10.02
## MntSweetProducts       MntSweetProducts            246              11.10
## MntGoldProds               MntGoldProds            205               9.25
## NumDealsPurchases     NumDealsPurchases             84               3.79
## NumWebPurchases         NumWebPurchases              3               0.14
## NumCatalogPurchases NumCatalogPurchases             23               1.04
## NumStorePurchases     NumStorePurchases              0               0.00
## NumWebVisitsMonth     NumWebVisitsMonth              8               0.36

kable(outlier_summary,
      caption = "Tabel Persentase Outlier Tiap Variabel")

Tabel Persentase Outlier Tiap Variabel
	Variabel	Jumlah_Outlier	Persentase_Outlier
Income	Income	8	0.36
Recency	Recency	0	0.00
MntWines	MntWines	35	1.58
MntFruits	MntFruits	246	11.10
MntMeatProducts	MntMeatProducts	174	7.85
MntFishProducts	MntFishProducts	222	10.02
MntSweetProducts	MntSweetProducts	246	11.10
MntGoldProds	MntGoldProds	205	9.25
NumDealsPurchases	NumDealsPurchases	84	3.79
NumWebPurchases	NumWebPurchases	3	0.14
NumCatalogPurchases	NumCatalogPurchases	23	1.04
NumStorePurchases	NumStorePurchases	0	0.00
NumWebVisitsMonth	NumWebVisitsMonth	8	0.36

max_outlier <- max(outlier_summary$Persentase_Outlier)

cat("Persentase outlier tertinggi sebesar",
    round(max_outlier,2),
    "% dari total data.\n")

## Persentase outlier tertinggi sebesar 11.1 % dari total data.

if(max_outlier < 15){
  cat("Proporsi outlier tergolong kecil sehingga data tidak dihapus dan hanya dilakukan standardisasi sebelum analisis PCA dan FA.")
} else {
  cat("Proporsi outlier cukup besar sehingga diperlukan penanganan lebih lanjut.")
}

## Proporsi outlier tergolong kecil sehingga data tidak dihapus dan hanya dilakukan standardisasi sebelum analisis PCA dan FA.

Cek Skewness Tiap Variabel

Distribusi data yang sangat skewed (tidak simetris) dapat mempengaruhi analisis. Jadi dicek nilai skwenessnya

skewness_value <- apply(data_numerik, 2, psych::skew)
skewness_table <- data.frame(
  Variabel = names(skewness_value),
  Skewness = round(skewness_value, 3)
)

skewness_table

##                                Variabel Skewness
## Income                           Income    6.754
## Recency                         Recency    0.002
## MntWines                       MntWines    1.169
## MntFruits                     MntFruits    2.099
## MntMeatProducts         MntMeatProducts    2.023
## MntFishProducts         MntFishProducts    1.914
## MntSweetProducts       MntSweetProducts    2.100
## MntGoldProds               MntGoldProds    1.837
## NumDealsPurchases     NumDealsPurchases    2.412
## NumWebPurchases         NumWebPurchases    1.195
## NumCatalogPurchases NumCatalogPurchases    1.879
## NumStorePurchases     NumStorePurchases    0.701
## NumWebVisitsMonth     NumWebVisitsMonth    0.218

max_skew <- max(abs(skewness_table$Skewness))

cat("Nilai skewness maksimum =", round(max_skew,3), "\n")

## Nilai skewness maksimum = 6.754

if(max_skew <= 2){
  cat("Distribusi data masih dapat diterima sehingga transformasi tidak diperlukan.")
} else {
  cat("Beberapa variabel sangat skewed sehingga diperlukan transformasi data.")
}

## Beberapa variabel sangat skewed sehingga diperlukan transformasi data.

Transformasi Log

Untuk mengurangi skewness (menjadikan distribusi lebih simetris), kita lakukan transformasi log

data_transform <- log1p(data_numerik)

summary(data_transform)

##      Income          Recency         MntWines       MntFruits    
##  Min.   : 7.456   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:10.472   1st Qu.:3.219   1st Qu.:3.219   1st Qu.:1.099  
##  Median :10.847   Median :3.912   Median :5.168   Median :2.197  
##  Mean   :10.754   Mean   :3.613   Mean   :4.678   Mean   :2.243  
##  3rd Qu.:11.135   3rd Qu.:4.317   3rd Qu.:6.227   3rd Qu.:3.526  
##  Max.   :13.410   Max.   :4.605   Max.   :7.309   Max.   :5.298  
##  MntMeatProducts MntFishProducts MntSweetProducts  MntGoldProds  
##  Min.   :0.000   Min.   :0.000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:2.833   1st Qu.:1.386   1st Qu.:0.6931   1st Qu.:2.303  
##  Median :4.234   Median :2.565   Median :2.1972   Median :3.238  
##  Mean   :4.130   Mean   :2.538   Mean   :2.2416   Mean   :3.122  
##  3rd Qu.:5.452   3rd Qu.:3.932   3rd Qu.:3.5264   3rd Qu.:4.043  
##  Max.   :7.454   Max.   :5.561   Max.   :5.5722   Max.   :5.775  
##  NumDealsPurchases NumWebPurchases NumCatalogPurchases NumStorePurchases
##  Min.   :0.0000    Min.   :0.000   Min.   :0.000       Min.   :0.000    
##  1st Qu.:0.6931    1st Qu.:1.099   1st Qu.:0.000       1st Qu.:1.386    
##  Median :1.0986    Median :1.609   Median :1.099       Median :1.792    
##  Mean   :1.0788    Mean   :1.479   Mean   :1.009       Mean   :1.803    
##  3rd Qu.:1.3863    3rd Qu.:1.946   3rd Qu.:1.609       3rd Qu.:2.197    
##  Max.   :2.7726    Max.   :3.332   Max.   :3.367       Max.   :2.639    
##  NumWebVisitsMonth
##  Min.   :0.000    
##  1st Qu.:1.386    
##  Median :1.946    
##  Mean   :1.751    
##  3rd Qu.:2.079    
##  Max.   :3.045

skew_after <- apply(data_transform, 2, psych::skew)

skew_after_table <- data.frame(
  Variabel = names(skew_after),
  Skewness_Setelah = round(skew_after,3)
)

skew_after_table

##                                Variabel Skewness_Setelah
## Income                           Income           -1.166
## Recency                         Recency           -1.552
## MntWines                       MntWines           -0.551
## MntFruits                     MntFruits            0.079
## MntMeatProducts         MntMeatProducts           -0.089
## MntFishProducts         MntFishProducts           -0.054
## MntSweetProducts       MntSweetProducts            0.082
## MntGoldProds               MntGoldProds           -0.348
## NumDealsPurchases     NumDealsPurchases            0.668
## NumWebPurchases         NumWebPurchases           -0.278
## NumCatalogPurchases NumCatalogPurchases            0.128
## NumStorePurchases     NumStorePurchases           -0.128
## NumWebVisitsMonth     NumWebVisitsMonth           -1.008

Standarisasi

Karena PCA dan FA sensitif terhadap skala variabel, semua variabel distandarisasi (mean = 0, simpangan baku = 1) agar kontribusi setiap variabel seimbang.

data_scaled <- scale(data_transform)

data_scaled <- as.data.frame(data_scaled)

summary(data_scaled)

##      Income           Recency           MntWines         MntFruits       
##  Min.   :-6.5186   Min.   :-3.7799   Min.   :-2.5905   Min.   :-1.42899  
##  1st Qu.:-0.5576   1st Qu.:-0.4125   1st Qu.:-0.8078   1st Qu.:-0.72915  
##  Median : 0.1843   Median : 0.3127   Median : 0.2714   Median :-0.02931  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000  
##  3rd Qu.: 0.7534   3rd Qu.: 0.7368   3rd Qu.: 0.8578   3rd Qu.: 0.81738  
##  Max.   : 5.2511   Max.   : 1.0378   Max.   : 1.4574   Max.   : 1.94616  
##  MntMeatProducts    MntFishProducts    MntSweetProducts    MntGoldProds     
##  Min.   :-2.64496   Min.   :-1.53178   Min.   :-1.40747   Min.   :-2.42521  
##  1st Qu.:-0.83039   1st Qu.:-0.69525   1st Qu.:-0.97226   1st Qu.:-0.63669  
##  Median : 0.06684   Median : 0.01598   Median :-0.02787   Median : 0.09027  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.84692   3rd Qu.: 0.84078   3rd Qu.: 0.80666   3rd Qu.: 0.71521  
##  Max.   : 2.12878   Max.   : 1.82367   Max.   : 2.09118   Max.   : 2.06015  
##  NumDealsPurchases  NumWebPurchases   NumCatalogPurchases NumStorePurchases 
##  Min.   :-2.29641   Min.   :-2.6395   Min.   :-1.3082     Min.   :-3.71622  
##  1st Qu.:-0.82097   1st Qu.:-0.6787   1st Qu.:-1.3082     1st Qu.:-0.85850  
##  Median : 0.04211   Median : 0.2331   Median : 0.1167     Median :-0.02267  
##  Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.0000     Mean   : 0.00000  
##  3rd Qu.: 0.65447   3rd Qu.: 0.8336   3rd Qu.: 0.7792     3rd Qu.: 0.81315  
##  Max.   : 3.60535   Max.   : 3.3079   Max.   : 3.0590     Max.   : 1.72395  
##  NumWebVisitsMonth
##  Min.   :-3.7769  
##  1st Qu.:-0.7874  
##  Median : 0.4194  
##  Mean   : 0.0000  
##  3rd Qu.: 0.7074  
##  Max.   : 2.7886

Karakteristik Data

Statistika Deskriptif

Ringkasan statistik deskriptif dari data yang sudah ditransformasi

library(psych)
library(knitr)

deskriptif <- psych::describe(data_transform)

kable(deskriptif,
      caption = "Tabel 1. Statistik Deskriptif Variabel Penelitian")

Tabel 1. Statistik Deskriptif Variabel Penelitian
	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
Income	1	2216	10.753825	0.5058438	10.847053	10.8050288	0.4756859	7.456455	13.410046	5.953591	-1.1663715	3.1644479	0.0107456
Recency	2	2216	3.613161	0.9558817	3.912023	3.7727792	0.7152450	0.000000	4.605170	4.605170	-1.5523459	2.3503812	0.0203058
MntWines	3	2216	4.677562	1.8056575	5.167635	4.8150219	1.9552057	0.000000	7.309212	7.309212	-0.5511811	-0.8387515	0.0383575
MntFruits	4	2216	2.243230	1.5698003	2.197225	2.2009627	1.9705770	0.000000	5.298317	5.298317	0.0788398	-1.1282818	0.0333472
MntMeatProducts	5	2216	4.129750	1.5613672	4.234107	4.1484416	1.9667803	0.000000	7.453562	7.453562	-0.0894357	-1.0566047	0.0331681
MntFishProducts	6	2216	2.538474	1.6572089	2.564949	2.5293593	1.9821951	0.000000	5.560682	5.560682	-0.0541993	-1.0887733	0.0352040
MntSweetProducts	7	2216	2.241620	1.5926590	2.197225	2.1948475	2.0553200	0.000000	5.572154	5.572154	0.0822682	-1.1493010	0.0338328
MntGoldProds	8	2216	3.122271	1.2874237	3.238486	3.1742975	1.2811488	0.000000	5.774552	5.774552	-0.3478324	-0.4178388	0.0273487
NumDealsPurchases	9	2216	1.078831	0.4697895	1.098612	1.0279417	0.6011426	0.000000	2.772589	2.772589	0.6676560	0.3370855	0.0099797
NumWebPurchases	10	2216	1.478846	0.5602754	1.609438	1.4877431	0.6968274	0.000000	3.332205	3.332205	-0.2784752	-0.3928257	0.0119019
NumCatalogPurchases	11	2216	1.008661	0.7710555	1.098612	0.9764349	0.7573501	0.000000	3.367296	3.367296	0.1282940	-1.0994588	0.0163795
NumStorePurchases	12	2216	1.802759	0.4851054	1.791759	1.8036359	0.6011426	0.000000	2.639057	2.639057	-0.1280872	-0.1829542	0.0103051
NumWebVisitsMonth	13	2216	1.751406	0.4637137	1.945910	1.8089092	0.3725988	0.000000	3.044522	3.044522	-1.0084595	0.5726253	0.0098507

Nilai mean dan median sebagian besar variabel cukup berdekatan, yang menunjukkan distribusi data relatif stabil setelah transformasi logaritma. Variabel pengeluaran seperti MntWines dan MntMeatProducts memiliki rata-rata lebih tinggi dibanding kategori produk lainnya, sehingga pelanggan cenderung lebih banyak mengeluarkan biaya pada produk tersebut. Nilai standar deviasi menunjukkan bahwa variasi pengeluaran antar pelanggan cukup beragam, sedangkan jumlah transaksi relatif lebih konsisten. Rentang minimum dan maksimum juga memperlihatkan adanya perbedaan perilaku pembelian antar pelanggan. Dilihat dari skewness, seluruh variabel berada pada rentang −2 hingga 2 sehingga distribusi data masih tergolong normal secara moderat.

Visualisasi Distribusi Data

Untuk memeriksa bentuk distribusi tiap variabel

par(mfrow=c(3,5))

for(i in names(data_transform)){
  hist(data_transform[[i]],
       main=i,
       xlab="Nilai",
       col="lightblue")
}

par(mfrow=c(1,1))

Gambar 1. Distribusi Data Setiap Variabel

Boxplot Karakteristik Data

Boxplot memberikan gambaran sebaran (kuartil dan pencilan) untuk tiap variabel

boxplot(data_transform,
        las=2,
        col="lightgreen")

Gambar 2. Boxplot Variabel Penelitian

Sebagian besar variabel memiliki distribusi yang cukup seimbang dengan median berada di sekitar pusat box. Beberapa outlier masih muncul terutama pada variabel pendapatan dan pengeluaran produk, namun jumlahnya kecil dan masih dianggap sebagai variasi alami data pelanggan.

Uji Kelayakan

Sebelum melakukan PCA dan FA, perlu diuji apakah data memenuhi asumsi kelayakan analisis faktor

library(REdaS)

## Loading required package: grid

data_num <- data_numerik[, sapply(data_numerik, is.numeric)]

cor_matrix <- cor(data_num)

Uji KMO

Mengecek apakah antar variabel memiliki korelasi yang cukup kuat untuk dibentuk menjadi faktor.

KMO(cor_matrix)

## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = cor_matrix)
## Overall MSA =  0.89
## MSA for each item = 
##              Income             Recency            MntWines           MntFruits 
##                0.93                0.39                0.87                0.93 
##     MntMeatProducts     MntFishProducts    MntSweetProducts        MntGoldProds 
##                0.91                0.93                0.94                0.93 
##   NumDealsPurchases     NumWebPurchases NumCatalogPurchases   NumStorePurchases 
##                0.52                0.85                0.89                0.89 
##   NumWebVisitsMonth 
##                0.82

Hasil menunjukkan bahwa Variabel recency memiliki nilai MSA yang kurang dari 0,5.

Drop Variabel MSA < 0.5

Recency kurang berkorelasi dengan variabel lain karena MSA = 0,39 jadi tidak ikut membentuk faktor laten.

library(dplyr)

data_final <- data_num %>%
  select(-Recency)

str(data_final)

## 'data.frame':    2216 obs. of  12 variables:
##  $ Income             : int  58138 46344 71613 26646 58293 62513 55635 33454 30351 5648 ...
##  $ MntWines           : int  635 11 426 11 173 520 235 76 14 28 ...
##  $ MntFruits          : int  88 1 49 4 43 42 65 10 0 0 ...
##  $ MntMeatProducts    : int  546 6 127 20 118 98 164 56 24 6 ...
##  $ MntFishProducts    : int  172 2 111 10 46 0 50 3 3 1 ...
##  $ MntSweetProducts   : int  88 1 21 3 27 42 49 1 3 1 ...
##  $ MntGoldProds       : int  88 6 42 5 15 14 27 23 2 13 ...
##  $ NumDealsPurchases  : int  3 2 1 2 5 2 4 2 1 1 ...
##  $ NumWebPurchases    : int  8 1 8 2 5 6 7 4 3 1 ...
##  $ NumCatalogPurchases: int  10 1 2 0 3 4 3 0 0 0 ...
##  $ NumStorePurchases  : int  4 2 10 4 6 10 7 4 2 0 ...
##  $ NumWebVisitsMonth  : int  7 5 4 6 5 6 6 8 9 20 ...

Uji Barlett’s

Mengecek apakah variabel saling berkorelasi secara signifikan.

bart_spher(data_final)

##  Bartlett's Test of Sphericity
## 
## Call: bart_spher(x = data_final)
## 
##      X2 = 13324.493
##      df = 66
## p-value < 2.22e-16

Menampilkan Matriks Korelasi

cor_matrix <- cor(data_final)
round(cor_matrix, 2)

##                     Income MntWines MntFruits MntMeatProducts MntFishProducts
## Income                1.00     0.58      0.43            0.58            0.44
## MntWines              0.58     1.00      0.39            0.57            0.40
## MntFruits             0.43     0.39      1.00            0.55            0.59
## MntMeatProducts       0.58     0.57      0.55            1.00            0.57
## MntFishProducts       0.44     0.40      0.59            0.57            1.00
## MntSweetProducts      0.44     0.39      0.57            0.54            0.58
## MntGoldProds          0.33     0.39      0.40            0.36            0.43
## NumDealsPurchases    -0.08     0.01     -0.13           -0.12           -0.14
## NumWebPurchases       0.39     0.55      0.30            0.31            0.30
## NumCatalogPurchases   0.59     0.63      0.49            0.73            0.53
## NumStorePurchases     0.53     0.64      0.46            0.49            0.46
## NumWebVisitsMonth    -0.55    -0.32     -0.42           -0.54           -0.45
##                     MntSweetProducts MntGoldProds NumDealsPurchases
## Income                          0.44         0.33             -0.08
## MntWines                        0.39         0.39              0.01
## MntFruits                       0.57         0.40             -0.13
## MntMeatProducts                 0.54         0.36             -0.12
## MntFishProducts                 0.58         0.43             -0.14
## MntSweetProducts                1.00         0.36             -0.12
## MntGoldProds                    0.36         1.00              0.05
## NumDealsPurchases              -0.12         0.05              1.00
## NumWebPurchases                 0.33         0.41              0.24
## NumCatalogPurchases             0.50         0.44             -0.01
## NumStorePurchases               0.46         0.39              0.07
## NumWebVisitsMonth              -0.42        -0.25              0.35
##                     NumWebPurchases NumCatalogPurchases NumStorePurchases
## Income                         0.39                0.59              0.53
## MntWines                       0.55                0.63              0.64
## MntFruits                      0.30                0.49              0.46
## MntMeatProducts                0.31                0.73              0.49
## MntFishProducts                0.30                0.53              0.46
## MntSweetProducts               0.33                0.50              0.46
## MntGoldProds                   0.41                0.44              0.39
## NumDealsPurchases              0.24               -0.01              0.07
## NumWebPurchases                1.00                0.39              0.52
## NumCatalogPurchases            0.39                1.00              0.52
## NumStorePurchases              0.52                0.52              1.00
## NumWebVisitsMonth             -0.05               -0.52             -0.43
##                     NumWebVisitsMonth
## Income                          -0.55
## MntWines                        -0.32
## MntFruits                       -0.42
## MntMeatProducts                 -0.54
## MntFishProducts                 -0.45
## MntSweetProducts                -0.42
## MntGoldProds                    -0.25
## NumDealsPurchases                0.35
## NumWebPurchases                 -0.05
## NumCatalogPurchases             -0.52
## NumStorePurchases               -0.43
## NumWebVisitsMonth                1.00

Principal Component Analysis (PCA)

PCA dilakukan untuk menentukan berapa banyak komponen utama yang dapat menggambarkan variansi data.

library(psych)

pca_result <- principal(data_final,
                        nfactors = ncol(data_final),
                        rotate = "none")

pca_result

## Principal Components Analysis
## Call: principal(r = data_final, nfactors = ncol(data_final), rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                       PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8   PC9  PC10
## Income               0.75 -0.03 -0.36  0.02  0.00 -0.14  0.31  0.31  0.17  0.23
## MntWines             0.75  0.28 -0.33 -0.17  0.07  0.23 -0.15 -0.07 -0.07  0.22
## MntFruits            0.72 -0.16  0.39  0.09  0.17 -0.02 -0.24  0.43 -0.16 -0.02
## MntMeatProducts      0.81 -0.15 -0.11  0.25 -0.12  0.29 -0.05  0.00 -0.04 -0.08
## MntFishProducts      0.74 -0.17  0.37  0.09  0.03  0.04 -0.10 -0.15  0.49  0.06
## MntSweetProducts     0.71 -0.14  0.34  0.13  0.29 -0.01  0.36 -0.24 -0.25  0.11
## MntGoldProds         0.58  0.24  0.37 -0.38 -0.54 -0.11  0.06  0.00 -0.09  0.08
## NumDealsPurchases   -0.10  0.79  0.07  0.54 -0.14 -0.20  0.00  0.01  0.00  0.02
## NumWebPurchases      0.56  0.62  0.02 -0.26  0.24  0.08  0.18  0.08  0.11 -0.35
## NumCatalogPurchases  0.82  0.02 -0.16  0.19 -0.26  0.23 -0.02 -0.08 -0.08 -0.12
## NumStorePurchases    0.75  0.24 -0.16 -0.10  0.22 -0.35 -0.30 -0.19 -0.06  0.05
## NumWebVisitsMonth   -0.64  0.51  0.24 -0.04  0.14  0.37 -0.03  0.06  0.02  0.25
##                      PC11  PC12 h2       u2 com
## Income               0.05  0.09  1 -6.7e-16 2.8
## MntWines            -0.17 -0.23  1  2.0e-15 2.9
## MntFruits           -0.08 -0.01  1  3.3e-16 3.1
## MntMeatProducts      0.36 -0.10  1 -2.2e-16 2.2
## MntFishProducts     -0.06 -0.02  1  0.0e+00 2.7
## MntSweetProducts    -0.02 -0.01  1  4.4e-16 3.3
## MntGoldProds         0.06 -0.02  1 -2.2e-16 4.1
## NumDealsPurchases   -0.04 -0.10  1 -8.9e-16 2.1
## NumWebPurchases      0.01 -0.02  1 -6.7e-16 3.7
## NumCatalogPurchases -0.21  0.27  1  4.4e-16 2.1
## NumStorePurchases    0.15  0.17  1  6.7e-16 2.9
## NumWebVisitsMonth    0.11  0.19  1  2.2e-16 3.8
## 
##                        PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10 PC11
## SS loadings           5.65 1.55 0.90 0.67 0.64 0.53 0.44 0.42 0.39 0.33 0.26
## Proportion Var        0.47 0.13 0.08 0.06 0.05 0.04 0.04 0.03 0.03 0.03 0.02
## Cumulative Var        0.47 0.60 0.68 0.73 0.78 0.83 0.87 0.90 0.93 0.96 0.98
## Proportion Explained  0.47 0.13 0.08 0.06 0.05 0.04 0.04 0.03 0.03 0.03 0.02
## Cumulative Proportion 0.47 0.60 0.68 0.73 0.78 0.83 0.87 0.90 0.93 0.96 0.98
##                       PC12
## SS loadings           0.22
## Proportion Var        0.02
## Cumulative Var        1.00
## Proportion Explained  0.02
## Cumulative Proportion 1.00
## 
## Mean item complexity =  3
## Test of the hypothesis that 12 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0 
##  with the empirical chi square  0  with prob <  NA 
## 
## Fit based upon off diagonal values = 1

Eigenvalue

pca_result$values

##  [1] 5.6491075 1.5539271 0.9030300 0.6661405 0.6372180 0.5326732 0.4428439
##  [8] 0.4182870 0.3902811 0.3311427 0.2565698 0.2187792

Aturan Kaiser: Eigenvalue > 1 → komponen dipertahankan

Scree Plot

Scree plot membantu melihat titik elbow pada grafik eigenvalue. Titik elbow adalah tempat penurunan tajam eigenvalue saat berhenti dan stabil.

scree(data_final)

Grafik menunjukkan penurunan tajam dari komponen pertama ke kedua, kemudian kurva mulai melandai setelah komponen kedua. Pola ini menandakan bahwa dua komponen sudah optimal dalam mewakili struktur data, sedangkan komponen berikutnya hanya memberikan tambahan informasi yang kecil.

Jumlah Komponen

Berdasarkan aturan eigenvalue > 1 dan scree plot, kita hitung berapa komponen yang diambil

sum(pca_result$values > 1)

## [1] 2

pca_final <- principal(data_final,
                       nfactors = 2,
                       rotate = "varimax")

print(pca_final, digits = 3, cutoff = 0.5)

## Principal Components Analysis
## Call: principal(r = data_final, nfactors = 2, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                        RC1   RC2    h2    u2  com
## Income               0.713 0.244 0.568 0.432 1.23
## MntWines             0.594 0.538 0.642 0.358 1.98
## MntFruits            0.723 0.116 0.536 0.464 1.05
## MntMeatProducts      0.808 0.158 0.678 0.322 1.08
## MntFishProducts      0.749 0.110 0.574 0.426 1.04
## MntSweetProducts     0.715 0.134 0.529 0.471 1.07
## MntGoldProds         0.458 0.433 0.397 0.603 1.99
## NumDealsPurchases   -0.382 0.700 0.635 0.365 1.55
## NumWebPurchases      0.289 0.780 0.693 0.307 1.27
## NumCatalogPurchases  0.755 0.315 0.670 0.330 1.34
## NumStorePurchases    0.607 0.495 0.614 0.386 1.92
## NumWebVisitsMonth   -0.782 0.235 0.667 0.333 1.18
## 
##                         RC1   RC2
## SS loadings           5.100 2.103
## Proportion Var        0.425 0.175
## Cumulative Var        0.425 0.600
## Proportion Explained  0.708 0.292
## Cumulative Proportion 0.708 1.000
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 2 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.051 
##  with the empirical chi square  774.25  with prob <  8.01e-135 
##  0.5
## Fit based upon off diagonal values = 0.986

Berdasarkan interpretasi komponen, interpretasinya dapat dijelaskan sebagai berikut. • Komponen 1 (RC1) - Pola Konsumsi dan Intensitas Pembelian Komponen pertama memiliki loading tinggi pada variabel pengeluaran produk (MntWines, MntMeatProducts, MntFruits, MntFishProducts, dan MntSweetProducts), pembelian katalog dan toko, serta Income. Variabel NumWebVisitsMonth memiliki loading negatif. Komponen ini menggambarkan tingkat konsumsi dan intensitas pembelian pelanggan, di mana pelanggan dengan skor tinggi cenderung memiliki pendapatan lebih besar dan aktivitas pembelian yang lebih tinggi. • Komponen 2 (RC2) - Aktivitas Pembelian Digital dan Respons Promosi Komponen kedua didominasi oleh NumWebPurchases dan NumDealsPurchases yang berkaitan dengan transaksi online dan penggunaan promo. Komponen ini merepresentasikan perilaku pelanggan yang aktif berbelanja secara digital dan responsif terhadap promosi.

FA

Setelah tahu jumlah faktor dari PCA, lanjut ke FA.

library(psych)

fa_result <- fa(data_final,
                nfactors = 2,
                rotate = "varimax",
                fm = "ml")

print(fa_result, digits = 3, cutoff = 0.5)

## Factor Analysis using method =  ml
## Call: fa(r = data_final, nfactors = 2, rotate = "varimax", fm = "ml")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                        ML1    ML2    h2    u2  com
## Income               0.725  0.084 0.533 0.467 1.03
## MntWines             0.688  0.386 0.622 0.378 1.57
## MntFruits            0.671 -0.017 0.451 0.549 1.00
## MntMeatProducts      0.819 -0.042 0.672 0.328 1.01
## MntFishProducts      0.701 -0.035 0.492 0.508 1.01
## MntSweetProducts     0.668  0.006 0.447 0.553 1.00
## MntGoldProds         0.497  0.242 0.306 0.694 1.45
## NumDealsPurchases   -0.164  0.481 0.258 0.742 1.23
## NumWebPurchases      0.436  0.682 0.654 0.346 1.70
## NumCatalogPurchases  0.806  0.092 0.658 0.342 1.03
## NumStorePurchases    0.673  0.318 0.555 0.445 1.43
## NumWebVisitsMonth   -0.682  0.344 0.584 0.416 1.48
## 
##                         ML1   ML2
## SS loadings           5.090 1.143
## Proportion Var        0.424 0.095
## Cumulative Var        0.424 0.519
## Proportion Explained  0.817 0.183
## Cumulative Proportion 0.817 1.000
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 2 factors are sufficient.
## 
## df null model =  66  with the objective function =  6.029 0.5 with Chi Square =  13324.49
## df of  the model are 43  and the objective function was  0.632 
##  0.5
## The root mean square of the residuals (RMSR) is  0.049 
## The df corrected root mean square of the residuals is  0.06 
##  0.5
## The harmonic n.obs is  2216 with the empirical chi square  345.107  with prob <  8.47e-49 
##  0.5The total n.obs was  2216  with Likelihood Chi Square =  1395.979  with prob <  1.36e-264 
##  0.5
## Tucker Lewis Index of factoring reliability =  0.8433
## RMSEA index =  0.1192  and the 90 % confidence intervals are  0.1139 0.1246 0.5
## BIC =  1064.73
## Fit based upon off diagonal values = 0.988
## Measures of factor score adequacy             
##                                                     ML1   ML2
## Correlation of (regression) scores with factors   0.958 0.844
## Multiple R square of scores with factors          0.918 0.712
## Minimum correlation of possible factor scores     0.836 0.423

Faktor Loading

loadings(fa_result)

## 
## Loadings:
##                     ML1    ML2   
## Income               0.725       
## MntWines             0.688  0.386
## MntFruits            0.671       
## MntMeatProducts      0.819       
## MntFishProducts      0.701       
## MntSweetProducts     0.668       
## MntGoldProds         0.497  0.242
## NumDealsPurchases   -0.164  0.481
## NumWebPurchases      0.436  0.682
## NumCatalogPurchases  0.806       
## NumStorePurchases    0.673  0.318
## NumWebVisitsMonth   -0.682  0.344
## 
##                  ML1   ML2
## SS loadings    5.090 1.143
## Proportion Var 0.424 0.095
## Cumulative Var 0.424 0.519

Baris menunjukkan variabel, kolom menunjukkan faktor. Nilai besar positif atau negatif menandakan korelasi kuat variabel dengan faktor tersebut.

Communality

Menunjukkan seberapa baik variabel dijelaskan oleh faktor.

fa_result$communality

##              Income            MntWines           MntFruits     MntMeatProducts 
##           0.5327114           0.6218775           0.4507759           0.6724945 
##     MntFishProducts    MntSweetProducts        MntGoldProds   NumDealsPurchases 
##           0.4919842           0.4465907           0.3059317           0.2584868 
##     NumWebPurchases NumCatalogPurchases   NumStorePurchases   NumWebVisitsMonth 
##           0.6544017           0.6584382           0.5546771           0.5841479

Communality > 0.5 → baik

Variance Explained

fa_result$Vaccounted

##                             ML1        ML2
## SS loadings           5.0897472 1.14277047
## Proportion Var        0.4241456 0.09523087
## Cumulative Var        0.4241456 0.51937647
## Proportion Explained  0.8166438 0.18335615
## Cumulative Proportion 0.8166438 1.00000000

Visualisasi Struktur Faktor

fa.diagram(fa_result)

Visualisasi Heatmap

library(pheatmap)
loading_matrix <- as.data.frame(unclass(fa_result$loadings))
loading_matrix

##                            ML1          ML2
## Income               0.7250563  0.083694370
## MntWines             0.6875563  0.386191495
## MntFruits            0.6711857 -0.016899718
## MntMeatProducts      0.8189785 -0.042057106
## MntFishProducts      0.7005317 -0.035207198
## MntSweetProducts     0.6682460  0.006166301
## MntGoldProds         0.4972339  0.242260454
## NumDealsPurchases   -0.1638996  0.481273042
## NumWebPurchases      0.4357205  0.681578568
## NumCatalogPurchases  0.8062505  0.091642786
## NumStorePurchases    0.6732396  0.318473735
## NumWebVisitsMonth   -0.6823087  0.344387492

pheatmap(loading_matrix,
         cluster_rows = FALSE,
         cluster_cols = FALSE,
         display_numbers = TRUE,
         main = "Heatmap Factor Loadings")

Warna merah menunjukkan loading positif kuat, sedangkan warna biru menunjukkan hubungan negatif. Visualisasi memperlihatkan bahwa variabel pengeluaran produk terkonsentrasi pada Faktor 1, sementara aktivitas pembelian digital dominan pada Faktor 2.

Berdasarkan nilai factor loading, interpretasi faktor dapat dijelaskan sebagai berikut. • Faktor 1 - Pola Konsumsi dan Intensitas Pembelian Produk Faktor ini memiliki loading tinggi pada variabel pengeluaran produk, pembelian katalog dan toko, serta Income. Faktor pertama merepresentasikan tingkat konsumsi dan intensitas pembelian pelanggan secara keseluruhan. Pelanggan dengan skor faktor tinggi cenderung memiliki pengeluaran dan aktivitas pembelian yang lebih besar. • Faktor 2 - Aktivitas Pembelian Online dan Respons Promosi Faktor kedua didominasi oleh NumWebPurchases dan NumDealsPurchases serta didukung kontribusi kunjungan website. Faktor ini menggambarkan pelanggan yang aktif bertransaksi melalui kanal digital dan lebih responsif terhadap promosi digital.

Kesimpulan

Berdasarkan hasil analisis Principal Component Analysis (PCA) dan Factor Analysis (FA) yang telah dilakukan pada dataset Marketing Campaign, dapat diitarik kesimpulan sebagai berikut. 1. Metode PCA berhasil menyederhanakan 11 variabel perilaku pelanggan menjadi 2 komponen utama yang lebih ringkas. Hal ini dibuktikan dengan nilai Eigenvalue > 1 dan diperkuat oleh visualisasi Scree Plot yang menunjukkan titik landai (siku) setelah komponen kedua. 2. Melalui Analisis Faktor dengan rotasi Variamax, terbentuka dua faktor laten yang menggambarkan profil pelanggan secara spesifik: - Faktor 1 (Profil Pelanggan Premium): dicirikan oleh pendataan tinggi serta pengeluaran besar pada produk mewah (anggur dan daging) yang dilakukan melalui saluran toko fisik dan katalog. - Faktor 2 (Profil Pelanggan Digital/Promo): dicirikan oleh intensitas kunjungan ke situs web yang tinggi dan kecenderungan melakukan pembelian saat terdapat penawaran diskon (deals). 3. Pengujian asumsi mellaui KMO dan Bartlett’s menunjukkan bahwa data sangat layak untuk dianalisis (setelah variabel Regency dikeluarkan), dengan total variasn yang mampu dijelaskan oleh model cukup signifikan untuk menggambarkan perilaku pasar.