Central Tendency
Exercises ~ Week 6
1 Pendahuluan
TUJUAN
Tujuan dari praktikum ini adalah untuk mempelajari konsep central tendency (tendensi sentral) serta menerapkannya pada variabel dan jenis data yang sesuai. Selain itu, praktikum ini juga bertujuan untuk menentukan metode central tendency yang paling tepat untuk setiap variabel dan menyajikan hasil analisis tersebut dalam bentuk visualisasi data yang informatif dan relevan menggunakan bahasa pemrograman R. Selain itu adapun tujuan lainnya sebagai berikut :
Mengetahui nilai pusat dari suatu data (mean, median, mode)
Menyederhanakan kumpulan data agar mudah dipahami
Membandingkan beberapa kelompok data berdasarkan nilai pusatnya
Menilai distribusi data, apakah simetris atau condong ke satu sisi
Menjadi dasar untuk analisis statistik lanjutan seperti regresi dan uji hipotesis
Mendukung proses pengambilan keputusan berdasarkan hasil data
DESKRIPSI DATASET
Dataset yang digunakan dalam praktikum ini berisi informasi mengenai aktivitas pelanggan pada sebuah toko atau perusahaan ritel. Data ini mencakup berbagai aspek seperti usia pelanggan, jenis kelamin, lokasi toko, kategori produk yang dibeli, total pembelian, jumlah kunjungan, dan penilaian kepuasan pelanggan. Dataset ini digunakan untuk menganalisis perilaku konsumen dan menerapkan konsep central tendency (mean, median, dan modus) pada variabel yang sesuai.
Variabel-variabel utama dalam dataset ini meliputi:
CustomerID : identitas unik setiap pelanggan
Age : usia pelanggan
Gender : jenis kelamin pelanggan (misalnya Male/Female)
Store Location : lokasi toko tempat pelanggan melakukan transaksi (misalnya West, South, East, dll.)
Product Category : kategori produk yang dibeli (misalnya Electronics, Books, Clothing, dll.)
Total Purchase : total nilai pembelian yang dilakukan oleh pelanggan
Number Of Visits : jumlah kunjungan pelanggan ke toko
Feedback Score : skor penilaian atau tingkat kepuasan yang diberikan pelanggan
KONTEKS KASUS
Pemahaman terhadap perilaku pelanggan menjadi hal penting untuk mendukung strategi pemasaran dan peningkatan layanan. Dataset yang digunakan berisi informasi seperti CustomerID, Age, Gender, StoreLocation, ProductCategory, TotalPurchase, NumberOfVisits, dan FeedbackScore yang mencerminkan karakteristik serta aktivitas pelanggan saat melakukan transaksi. Melalui analisis data ini, dapat dipahami pola pembelian, preferensi produk, tingkat kunjungan, serta tingkat kepuasan pelanggan. Penerapan konsep central tendency (mean, median, dan modus) pada variabel-variabel yang relevan membantu menggambarkan karakteristik data secara lebih akurat, sedangkan visualisasi data digunakan untuk menyajikan hasil analisis secara informatif dan mudah dipahami.
2 Persiapan Data
TUJUAN
Bagian ini bertujuan untuk menyiapkan dataset pelanggan yang akan digunakan dalam proses analisis central tendency dan visualisasi data. Tahapan ini meliputi proses impor data, pemeriksaan struktur dataset, pengecekan kesesuaian variabel seperti Age, Gender, StoreLocation, ProductCategory, TotalPurchase, NumberOfVisits, dan FeedbackScore, serta memastikan data tersusun dengan rapi dan siap diolah dalam format yang dikenali oleh R.
DESKRIPSI DATASET
Dataset ini berisi informasi mengenai aktivitas dan karakteristik pelanggan, dengan beberapa variabel utama sebagai berikut:
CustomerID : kode unik untuk setiap pelanggan
Age : usia pelanggan
Gender : jenis kelamin pelanggan
StoreLocation : lokasi toko tempat pelanggan melakukan transaksi
ProductCategory : kategori produk yang dibeli
TotalPurchase : total nilai pembelian yang dilakukan pelanggan
NumberOfVisits : jumlah kunjungan pelanggan ke toko
FeedbackScore : skor atau penilaian kepuasan pelanggan
3 Metode Analisis & Eksekusi Pervariable
3.1 Age
jenis data ini termasuk kedalam numerik diskrit kaerena data berbentuk angka
Berdasarkan data tersebut umur cocok dihitung menggunakan Mean. Mean cocok digunakan untuk menghitung umur karena umur adalah data angka yang bersifat kontinu dan umumnya tidak memiliki banyak nilai ekstrem. Dengan mean, kita bisa melihat rata-rata usia pelanggan secara jelas dan menggambarkan kelompok umur terbesar secara lebih akurat.
Rumus mean
Mean untuk Data Tunggal
\[ \bar{X} = \frac{\sum_{i=1}^{n} X_i}{n} \] Keterangan:
- \(\bar{X}\): mean atau nilai
rata-rata
- \(X_i\): nilai data ke-i
- \(n\): jumlah total data
Mean untuk Data Berkelompok
\[ \bar{X} = \frac{\sum (f_i \cdot X_i)}{\sum f_i} \] Keterangan:
\(\bar{X}\): mean (rata-rata)
\(f_i\): frekuensi pada kelas ke-i
\(X_i\): titik tengah (nilai tengah) kelas ke-i
\(\sum f_i\): total frekuensi seluruh kelas
- Menghitung mean
data <- c(32,37,63,41,42,66,47,21,30,33,58,45,46,42,32,67,47,18,51,33,24,37,25,29,31,18,53,42,23,59,
46,36,53,53,52,50,48,39,35,34,30,37,21,70,58,23,34,33,52,39,44,40,39,61,37,63,18,49,42,43,
46,32,35,25,24,45,47,41,54,70,33,18,55,29,30,55,36,22,43,38,40,46,34,50,37,45,56,47,35,57,
55,48,44,31,60,31,70,63,36,25,29,44,36,35,26,39,28,18,34,54,31,49,18,39,48,45,42,30,27,25,
42,26,33,36,68,30,44,41,26,39,62,47,41,34,18,57,18,51,69,18,51,36,18,18,18,32,18,50,70,21,
52,52,45,25,38,36,48,34,55,34,56,24,21,70,34,44,50,33,48,46,37,41,39,70,29,24,41,45,47,33,
24,59,35,27,36,37,57,41,51,33,43,35,41,27,20,70,49,21,31,22)
# Frekuensi dan nilai umur
fi <- table(data)
Xi <- as.numeric(names(fi))
# Perhitungan mean
mean_manual <- sum(fi * Xi) / sum(fi)
# Tampilkan hasil
cat("Mean (Rumus Manual) = ", round(mean_manual, 2), "\n")## Mean (Rumus Manual) = 39.99
- Visualisasi Penggunaan Mean terhadap Usia
## === Visualisasi Distribusi Usia Pelanggan === ##
# --- Data usia ---
data <- c(
32,37,63,41,42,66,47,21,30,33,58,45,46,42,32,67,47,18,51,33,24,37,25,29,31,18,53,42,23,59,
46,36,53,53,52,50,48,39,35,34,30,37,21,70,58,23,34,33,52,39,44,40,39,61,37,63,18,49,42,43,
46,32,35,25,24,45,47,41,54,70,33,18,55,29,30,55,36,22,43,38,40,46,34,50,37,45,56,47,35,57,
55,48,44,31,60,31,70,63,36,25,29,44,36,35,26,39,28,18,34,54,31,49,18,39,48,45,42,30,27,25,
42,26,33,36,68,30,44,41,26,39,62,47,41,34,18,57,18,51,69,18,51,36,18,18,18,32,18,50,70,21,
52,52,45,25,38,36,48,34,55,34,56,24,21,70,34,44,50,33,48,46,37,41,39,70,29,24,41,45,47,33,
24,59,35,27,36,37,57,41,51,33,43,35,41,27,20,70,49,21,31,22
)
# --- Hitung nilai mean ---
mean_value <- mean(data)
# === 1. Histogram dengan Persentase dan Garis Distribusi === #
hist_data <- hist(
data,
breaks = 15,
col = "#FFB6C1", # warna batang pink muda
border = "#FF69B4", # warna tepi batang pink tua
main = "Distribusi Usia Pelanggan dengan Garis Distribusi & Mean",
xlab = "Usia (Tahun)",
freq = TRUE
)
# --- Hitung persentase tiap batang ---
percentages <- (hist_data$counts / sum(hist_data$counts)) * 100
# --- Tampilkan persentase di atas batang histogram ---
text(
x = hist_data$mids,
y = hist_data$counts,
labels = paste0(round(percentages, 1), "%"),
pos = 3, cex = 0.8, col = "#C71585"
)
# --- Tambahkan garis distribusi (KDE) ---
lines(density(data), lwd = 2, col = "#C71585")
# --- Tambahkan garis mean ---
abline(v = mean_value, col = "#FF1493", lwd = 2, lty = 2)
# --- Tambahkan legenda ---
legend(
"topright",
legend = c(paste("Mean =", round(mean_value, 2)), "Distribusi (KDE)"),
col = c("#FF1493", "#C71585"),
lwd = 2,
lty = c(2, 1),
box.lty = 0,
cex = 0.9
)- Interpretasi
Grafik tersebut menunjukkan distribusi usia pelanggan dengan rata-rata (mean) usia sebesar sekitar 39,99 tahun. Sebagian besar pelanggan berada pada rentang usia 30–45 tahun, dengan frekuensi tertinggi pada kelompok usia sekitar 30–35 tahun (sekitar 16,5% dari total). Distribusi tampak relatif seimbang, meskipun sedikit condong ke arah usia muda hingga paruh baya. Proporsi pelanggan kemudian menurun secara bertahap setelah usia 45 tahun, menunjukkan bahwa jumlah pelanggan berkurang pada kelompok usia yang lebih tua. Secara keseluruhan, mayoritas pelanggan berada pada kelompok usia produktif.
3.2 Gender
jenis data ini termasuk kedalam Kategorikal Nominal
Central Tendency yang tepat untuk variabel jenis kelamin adalah modus, karena jenis kelamin termasuk data kategorik nominal yang tidak dapat dihitung menggunakan mean atau median. Variabel ini hanya menunjukkan kategori seperti laki-laki dan perempuan tanpa memiliki nilai numerik atau urutan.
Rumus Modus
Rumus Modus Untuk Data Tunggal
Modus adalah nilai yang paling sering muncul dalam suatu dataset. Pada data tunggal, nilai dengan frekuensi tertinggi dianggap sebagai modus.
\[ \text{Modus} = X_i \text{ dengan frekuensi tertinggi } (f_i) \]
Keterangan:
\(X_i\) : nilai data ke-i
\(f_i\) : frekuensi atau jumlah kemunculan nilai \(X_i\)
Rumus Modus Untuk Data Berkelompok
\[ \text{Modus} = L + \left( \frac{f_m - f_{m-1}}{(f_m - f_{m-1}) + (f_m - f_{m+1})} \right) \times w \]
Keterangan:
- \(L\): batas bawah kelas
modus
- \(f_m\): frekuensi kelas
modus
- \(f_{m-1}\): frekuensi kelas
sebelum kelas modus
- \(f_{m+1}\): frekuensi kelas
setelah kelas modus
- \(w\): lebar kelas interval
- Hasil Perhitungan Modus
# Data Gender
gender <- c("M","F","M","M","F","F","M","F","F","M","F","M","F","F","F","F",
"M","F","F","F","M","M","M","F","M","M","F","F","F","M","M","M",
"F","M","F","M","F","F","F","F","F","M","M","F","M","M","M","F",
"F","M","M","F","F","F","F","M","M","M","F","F","M","F","F","F",
"M","F","F","M","F","M","F","M","F","M","M","F","M","F","F","M",
"F","M","F","M","M","F","F","M","F","M","M","F","F","M","F","M",
"F","F","M","M","F","F","F","F","F","F","M","M","M","F","F","F",
"M","F","M","F","F","M","F","M","F","F","F","F","M","M","M","F",
"M","F","M","M","F","M","F","M","F","M","M","F","M","F","M","F",
"M","M","F","M","M","F","F","F","M","M","M","M","M","M","M","F",
"F","M","M","F","M","M","F","M","M","M","M","M","M","F","M","M",
"F","F","M","F","M","M","M","F","M","F","F","M","M","M","M","F",
"M","F","F","M","M","F","M","F")
# Fungsi mencari modus
modus <- function(x) {
unique_values <- unique(x)
freq <- tabulate(match(x, unique_values))
unique_values[which.max(freq)]
}
# Output modus
modus(gender)## [1] "M"
- Visualisasi Penggunaan Modus Terhadap Jenis Kelamin
# Data jenis kelamin
gender <- c("M","F","M","M","F","F","M","F","F","M","F","M","F","F","F","F","M","F","F","F",
"M","M","M","F","M","M","F","F","F","M","M","M","F","M","F","M","F","F","F","F",
"F","M","M","F","M","M","M","F","F","M","M","F","F","F","F","M","M","M","F","F",
"M","F","F","F","M","F","F","M","F","M","F","M","F","M","M","F","M","F","F","M",
"F","M","F","M","M","F","F","M","F","M","M","F","F","M","F","M","F","F","M","M",
"F","F","F","F","F","F","M","M","M","F","F","F","M","F","M","F","F","M","F","M",
"F","F","F","F","M","M","M","F","M","F","M","M","F","M","F","M","F","M","M","F",
"M","F","M","F","M","M","F","M","M","F","F","F","M","M","M","M","M","M","M","F",
"F","M","M","F","M","M","F","M","M","M","M","M","M","F","M","M","F","F","M","F",
"M","M","M","F","M","F","F","M","M","M","M","F","M","F","F","M","M","F","M","F")
# Menghitung frekuensi
freq_gender <- table(gender)
# Modus
modus_gender <- names(freq_gender[freq_gender == max(freq_gender)])
print(paste("Modus data gender adalah:", modus_gender))## [1] "Modus data gender adalah: M"
# Barplot nuansa pink
bar_positions <- barplot(freq_gender,
main = "Frekuensi Jenis Kelamin dengan Garis Modus",
ylab = "Frekuensi",
col = c("pink", "lightpink"),
border = "red")
# Tambahkan garis horizontal di level modus
abline(h = max(freq_gender), col = "red", lty = 2, lwd = 2)
# Tambahkan label frekuensi di atas batang
text(x = bar_positions, y = freq_gender + 0.5, labels = freq_gender)- Interpretasi Berdasarkan visualisasi frekuensi jenis kelamin, terlihat bahwa kategori perempuan (M) memiliki batang yang lebih tinggi dibandingkan laki-laki (F), menunjukkan bahwa jumlah perempuan dalam dataset lebih banyak daripada laki-laki. Garis horizontal merah putus-putus menandai modus, yaitu kategori dengan frekuensi tertinggi, yang berada pada batang perempuan. Hal ini menegaskan bahwa perempuan merupakan modus dari data jenis kelamin, atau dengan kata lain, kategori yang paling sering muncul adalah perempuan.
3.3 Store Location
jenis data yang digunakan pada variabel store location adalah data kategorikal nominal karena data Tidak memiliki urutan tertentu.
Data menggunakan ukuran pemusatan modus karena data berupa label/ kategori tanpa urutan matematis.
Rumus Modus
Rumus Modus Untuk Data Tunggal
Modus adalah nilai yang paling sering muncul dalam suatu dataset. Pada data tunggal, nilai dengan frekuensi tertinggi dianggap sebagai modus.
\[ \text{Modus} = X_i \text{ dengan frekuensi tertinggi } (f_i) \]
Keterangan:
\(X_i\) : nilai data ke-i
\(f_i\) : frekuensi atau jumlah kemunculan nilai \(X_i\)
Rumus Modus Untuk Data Berkelompok
\[ \text{Modus} = L + \left( \frac{f_m - f_{m-1}}{(f_m - f_{m-1}) + (f_m - f_{m+1})} \right) \times w \] 4.. Menghitung Modus
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.5.2
library(ggplot2)
# --- Data Store Location ---
store_data <- c("West", "South", "West", "North", "East", "East", "East", "South", "North", "South",
"East", "North", "South", "North", "South", "South", "West", "East", "South", "South",
"South", "South", "South", "West", "West", "East", "North", "North", "East", "East",
"East", "West", "East", "South", "North", "North", "East", "South", "West", "West",
"South", "West", "East", "West", "West", "North", "East", "West", "East", "West",
"North", "North", "East", "East", "West", "South", "South", "West", "East", "South",
"East", "East", "South", "East", "East", "North", "East", "North", "South", "West",
"West", "North", "East", "East", "South", "West", "West", "South", "North", "South",
"East", "East", "East", "South", "North", "South", "West", "East", "South", "East",
"West", "East", "North", "South", "South", "South", "West", "West", "South", "West",
"South", "North", "North", "East", "West", "North", "East", "East", "South", "West",
"East", "South", "North", "West", "North", "West", "West", "South", "North", "East",
"South", "West", "West", "West", "South", "East", "North", "West", "North", "South",
"East", "North", "North", "East", "East", "South", "West", "North", "West", "East",
"East", "East", "West", "North", "East", "East", "North", "South", "North", "South",
"West", "South", "North", "East", "North", "South", "South", "North", "South", "North",
"West", "North", "West", "South", "South", "West", "South", "West", "South", "West",
"East", "North", "West", "North", "North", "West", "North", "East", "West", "South",
"North", "North", "East", "East", "North", "North", "West", "South", "West", "West",
"East", "West", "North", "West", "North", "East", "North", "West", "South", "South",
rep("South", 10)
)
# --- Hitung Frekuensi ---
freq_table <- as.data.frame(table(store_data))
colnames(freq_table) <- c("StoreLocation", "Frekuensi")
# --- Hitung Modus ---
modus <- freq_table$StoreLocation[which.max(freq_table$Frekuensi)]
cat("Modus Store Location adalah:", modus, "\n")## Modus Store Location adalah: 3
## StoreLocation Frekuensi
## 1 East 51
## 2 North 47
## 3 South 61
## 4 West 51
maka modus pada store location adalah South
- Visualisasi bar chart store location
library(ggplot2)
store_data <- data.frame(
StoreLocation = c("South", "East", "West", "North"),
Frekuensi = c(61, 53, 47, 33)
)
ggplot(store_data, aes(x = StoreLocation, y = Frekuensi, fill = StoreLocation)) +
geom_bar(stat = "identity", width = 0.7) +
geom_text(aes(label = Frekuensi),
vjust = -0.3, size = 5, fontface = "bold") +
labs(
title = "Frekuensi Store Location",
x = "Store Location",
y = "Frekuensi"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "none"
) +
scale_fill_brewer(palette = "Set2")- Interpretasi
Modus dari data StoreLocation adalah South, karena wilayah ini memiliki jumlah toko terbanyak. Distribusi toko cenderung tidak merata, di mana wilayah Selatan menjadi pusat utama aktivitas toko, sedangkan wilayah Utara memiliki representasi paling sedikit.
3.4 Product Category
jenis data yang digunakan pada variabel store location adalah data kategorikal nominal karena data Tidak memiliki urutan tertentu.
Data menggunakan ukuran pemusatan modus karena data berupa label/ kategori tanpa urutan matematis.
Rumus Modus
Rumus Modus Untuk Data Tunggal
Modus adalah nilai yang paling sering muncul dalam suatu dataset. Pada data tunggal, nilai dengan frekuensi tertinggi dianggap sebagai modus.
\[ \text{Modus} = X_i \text{ dengan frekuensi tertinggi } (f_i) \] Keterangan:
\(X_i\) : nilai data ke-i
\(f_i\) : frekuensi atau jumlah kemunculan nilai \(X_i\)
Rumus Modus Untuk Data Berkelompok
\[ \text{Modus} = L + \left( \frac{f_m - f_{m-1}}{(f_m - f_{m-1}) + (f_m - f_{m+1})} \right) \times w \] 4. Menghitung Modus
# --- Data Product Category ---
ProductCategory <- c(
"Electronics","Books","Electronics","Sports","Electronics","Sports","Sports","Clothing","Sports","Books",
"Clothing","Electronics","Home","Sports","Books","Home","Home","Books","Electronics","Electronics",
"Books","Sports","Home","Clothing","Electronics","Books","Books","Books","Sports","Clothing",
"Clothing","Sports","Home","Books","Sports","Sports","Electronics","Home","Home","Electronics",
"Electronics","Electronics","Home","Home","Books","Clothing","Clothing","Books","Clothing",
"Electronics","Sports","Home","Sports","Electronics","Clothing","Books","Books","Electronics",
"Home","Electronics","Books","Sports","Books","Home","Home","Books","Sports","Sports","Clothing",
"Home","Home","Electronics","Sports","Clothing","Sports","Books","Clothing","Home","Sports",
"Sports","Clothing","Clothing","Books","Clothing","Electronics","Books","Electronics","Home",
"Clothing","Home","Clothing","Clothing","Sports","Clothing","Sports","Sports","Electronics",
"Electronics","Home","Electronics","Home","Electronics","Clothing","Electronics","Sports",
"Sports","Books","Clothing","Books","Sports","Electronics","Sports","Books","Clothing","Home",
"Clothing","Electronics","Clothing","Sports","Sports","Clothing","Clothing","Books","Home",
"Clothing","Sports","Electronics","Home","Sports","Clothing","Electronics","Books","Books",
"Sports","Electronics","Clothing","Sports","Electronics","Electronics","Clothing","Clothing",
"Books","Sports","Clothing","Clothing","Home","Electronics","Sports","Clothing","Clothing",
"Books","Books","Books","Home","Home","Books","Books","Clothing","Home","Clothing","Clothing",
"Books","Home","Sports","Sports","Clothing","Clothing","Electronics","Sports","Home","Sports",
"Home","Electronics","Clothing","Books","Sports","Home","Home","Home","Books","Sports","Sports",
"Clothing","Clothing","Clothing","Clothing","Clothing","Sports","Electronics","Electronics",
"Electronics","Electronics","Sports","Electronics","Clothing","Sports","Books","Books","Sports","Books"
)
# --- Hitung modus ---
mode_value <- names(which.max(table(ProductCategory)))
# --- Tampilkan hasil ---
cat("Modus dari Product Category adalah:", mode_value)## Modus dari Product Category adalah: Clothing
- Visualisasi
library(ggplot2)
ProductCategory <- c(
"Electronics","Books","Electronics","Sports","Electronics","Sports","Sports","Clothing","Sports","Books",
"Clothing","Electronics","Home","Sports","Books","Home","Home","Books","Electronics","Electronics",
"Books","Sports","Home","Clothing","Electronics","Books","Books","Books","Sports","Clothing",
"Clothing","Sports","Home","Books","Sports","Sports","Electronics","Home","Home","Electronics",
"Electronics","Electronics","Home","Home","Books","Clothing","Clothing","Books","Clothing",
"Electronics","Sports","Home","Sports","Electronics","Clothing","Books","Books","Electronics","Home",
"Electronics","Books","Sports","Books","Home","Home","Books","Sports","Sports","Clothing","Home",
"Home","Electronics","Sports","Clothing","Sports","Books","Clothing","Home","Sports","Sports",
"Clothing","Clothing","Books","Clothing","Electronics","Books","Electronics","Home","Clothing",
"Home","Clothing","Clothing","Sports","Clothing","Sports","Sports","Electronics","Electronics",
"Home","Electronics","Home","Electronics","Clothing","Electronics","Sports","Sports","Books",
"Clothing","Books","Sports","Electronics","Sports","Books","Clothing","Home","Clothing","Electronics",
"Clothing","Sports","Sports","Clothing","Clothing","Books","Home","Clothing","Sports","Electronics",
"Home","Sports","Clothing","Electronics","Books","Books","Sports","Electronics","Clothing","Sports",
"Electronics","Electronics","Clothing","Clothing","Books","Sports","Clothing","Clothing","Home",
"Electronics","Sports","Clothing","Clothing","Books","Books","Books","Home","Home","Books","Books",
"Clothing","Home","Clothing","Clothing","Books","Home","Sports","Sports","Clothing","Clothing",
"Electronics","Sports","Home","Sports","Home","Electronics","Clothing","Books","Sports","Home",
"Home","Home","Books","Sports","Sports","Clothing","Clothing","Clothing","Clothing","Clothing",
"Sports","Electronics","Electronics","Electronics","Electronics","Sports","Electronics","Clothing",
"Sports","Books","Books","Sports","Books"
)
df <- data.frame(ProductCategory)
category_count <- as.data.frame(table(df$ProductCategory))
colnames(category_count) <- c("Category", "Count")
ggplot(category_count, aes(x = reorder(Category, -Count), y = Count, fill = Category)) +
geom_bar(stat = "identity", color = "black", alpha = 0.8) +
geom_text(aes(label = Count), vjust = -0.5, size = 4) +
labs(title = "Jumlah Produk per Kategori",
x = "Kategori Produk",
y = "Jumlah",
fill = "Kategori") +
theme_minimal(base_size = 14) +
theme(legend.position = "none")- Interpretasi
Berdasarkan visualisasi bar chart, kategori Clothing memiliki jumlah terbanyak dibanding kategori lain. Artinya, Clothing merupakan kategori produk yang paling dominan atau paling sering muncul dalam data.
3.5 Total Purchase
jenis data tersebut merupakan jenis data numerik diskrit
Distribusi data Total Purchase, penggunaan median sebagai ukuran pemusatan lebih relevan dibandingkan mean. Hal ini karena data memiliki rentang nilai yang sangat lebar dan terdapat beberapa nilai ekstrem (outlier) yang dapat memengaruhi rata-rata secara signifikan.
rumus median :
Rumus median untuk data tunggal adalah:
\[ \tilde{X} = \begin{cases} X_{\frac{n+1}{2}}, & \text{jika $n$ ganjil} \\ \dfrac{X_{\frac{n}{2}} + X_{\frac{n}{2}+1}}{2}, & \text{jika $n$ genap} \end{cases} \]
Keterangan:
- \(\tilde{X}\) : nilai median
- \(n\) : jumlah data
- \(X_i\) : data ke-i setelah diurutkan
Untuk data berkelompok, median dihitung menggunakan rumus:
\[ Me = L + \left( \frac{\frac{N}{2} - F}{f_m} \right) \times c \]
Keterangan:
- \(Me\) : median
- \(L\) : tepi bawah kelas
median
- \(N\) : jumlah seluruh
frekuensi
- \(F\) : frekuensi kumulatif sebelum
kelas median
- \(f_m\) : frekuensi kelas
median
- \(c\) : panjang kelas
- median dari dataset TOTAL PURCHASE
# Membuat vektor data TotalPurchase
TotalPurchase <- c(
528,72,327,391,514,381,510,102,559,27,40,217,118,532,25,87,77,80,209,232,
23,444,127,90,165,77,52,91,390,127,81,514,101,68,471,621,327,107,132,1128,
247,382,117,81,97,66,158,27,80,104,554,33,532,374,77,33,82,144,37,270,61,
553,95,101,82,86,451,417,83,74,76,384,417,51,574,82,65,139,355,548,34,151,
68,39,98,37,654,80,89,41,59,105,601,44,471,540,186,256,27,92,71,173,91,400,
427,400,25,32,85,468,87,491,66,33,107,476,226,62,542,339,37,19,34,110,107,
529,156,75,458,78,517,30,78,448,373,52,609,250,282,66,116,30,525,105,78,43,
136,567,33,160,30,65,30,113,141,89,33,79,29,119,135,32,164,426,514,58,81,
576,424,60,480,97,225,11,32,597,127,38,129,76,546,338,83,300,66,53,98,472,
367,385,245,136,368,182,126,304,29,32,409,83
)
# Menghitung median
median_TotalPurchase <- median(TotalPurchase)
# Menampilkan hasil median
median_TotalPurchase## [1] 108.5
- Visualisasi Median Terhadap Total Purchase
# --- Aktifkan library ---
library(ggplot2)
# --- Masukkan data angka ---
data <- c(
528,72,327,391,514,381,510,102,559,27,40,217,118,532,25,87,77,80,209,232,23,444,127,90,165,
77,52,91,390,127,81,514,101,68,471,621,327,107,132,1128,247,382,117,81,97,66,158,27,80,104,
554,33,532,374,77,33,82,144,37,270,61,553,95,101,82,86,451,417,83,74,76,384,417,51,574,82,
65,139,355,548,34,151,68,39,98,37,654,80,89,41,59,105,601,44,471,540,186,256,27,92,71,173,
91,400,427,400,25,32,85,468,87,491,66,33,107,476,226,62,542,339,37,19,34,110,107,529,156,75,
458,78,517,30,78,448,373,52,609,250,282,66,116,30,525,105,78,43,136,567,33,160,30,65,30,113,
141,89,33,79,29,119,135,32,164,426,514,58,81,576,424,60,480,97,225,11,32,597,127,38,129,76,
546,338,83,300,66,53,98,472,367,385,245,136,368,182,126,304,29,32,409,83
)
# --- Buat boxplot warna pink ---
boxplot(data,
main = "Boxplot Data Nilai",
ylab = "Nilai",
col = "pink", # warna isi
border = "deeppink") # warna garis luar
# --- Tambahkan garis bantu ---
grid()## Nilai terkecil : 11
## Kuartil 1 (Q1): 68
## Median : 108.5
## Kuartil 3 (Q3): 381.25
## Nilai terbesar : 1128
- Interpretasi
Dari data boxplot nilai di atas, terlihat bahwa sebaran data cukup lebar dengan banyak nilai ekstrem (outlier) di bagian atas. Nilai tengah (median) berada di kisaran 100–200, menunjukkan sebagian besar data berada di rentang rendah, sementara terdapat beberapa nilai yang sangat tinggi hingga di atas 1000 yang menyebabkan distribusi menjadi miring ke kanan (right-skewed). Hal ini berarti sebagian besar data relatif kecil, tetapi ada beberapa nilai besar yang meningkatkan rata-rata secara signifikan.
3.6 Number of visit
Jenis data yang digunakan pada variabel number of visit adalah data numerik diskrit karena data berupa angka dan merupakan bilangan bulat.
Ukuran pemusatan Karena data menyebar merata (tidak banyak nilai ekstrem) maka digunakan ukuran pemusatan mean (rata-rata)
Rumus Mean
\[ \bar{x} = \frac{\sum x_i}{n} \]
Keterangan:
\(x_i\) = nilai ke-i dari data
\(n\) = jumlah total data
- Menghitung Mean
# Data
data <- c(
4,4,4,7,7,6,5,4,2,5,3,6,4,6,3,4,7,7,2,7,3,5,5,5,6,5,10,10,4,5,4,5,3,1,2,2,6,5,7,5,
3,5,8,7,5,4,6,7,8,3,4,3,7,3,7,6,8,5,8,7,7,4,3,4,6,4,4,2,5,3,7,2,6,7,2,4,6,6,4,8,
4,4,4,5,5,2,6,3,6,5,6,5,4,3,5,11,4,4,2,5,5,6,9,5,2,4,7,7,7,8,6,5,3,6,2,5,8,7,6,3,
5,10,5,4,2,2,7,4,8,5,5,8,3,7,4,5,4,2,6,3,6,5,3,11,5,2,8,7,5,4,7,9,7,3,9,5,1,3,1,7,
6,8,4,9,7,7,5,9,5,8,4,5,8,6,1,3,5,4,11,7,4,6,8,6,6,3,5,4,5,3,4,6,5,3,4,6,5,3,2,9
)
# Menghitung mean
mean_data <- mean(data)
# Menampilkan hasil
mean_data## [1] 5.165
- Visualisasi histogram Number of Visits dengan Garis Distribusi & Mean
# --- Panggil library ---
library(ggplot2)
# --- Data jumlah kunjungan ---
NumberOfVisits <- c(
4,4,4,7,7,6,5,4,2,5,3,6,4,6,3,4,7,7,2,7,3,5,5,5,6,5,10,10,4,5,4,5,3,1,2,2,6,5,7,5,
3,5,8,7,5,4,6,7,8,3,4,3,7,3,7,6,8,5,8,7,7,4,3,4,6,4,4,2,5,3,7,2,6,7,2,4,6,6,4,8,4,4,
4,5,5,2,6,3,6,5,6,5,4,3,5,11,4,4,2,5,5,6,9,5,2,4,7,7,7,8,6,5,3,6,2,5,8,7,6,3,5,10,5,
4,2,2,7,4,8,5,5,8,3,7,4,5,4,2,6,3,6,5,3,11,5,2,8,7,5,4,7,9,7,3,9,5,1,3,1,7,6,8,4,9,7,
7,5,9,5,8,4,5,8,6,1,3,5,4,11,7,4,6,8,6,6,3,5,4,5,3,4,6,5,3,4,6,5,3,2,9
)
# --- Buat data frame ---
df <- data.frame(NumberOfVisits)
# --- Hitung mean ---
mean_value <- mean(df$NumberOfVisits)
# --- Hitung frekuensi tiap nilai ---
freq_data <- as.data.frame(table(df$NumberOfVisits))
colnames(freq_data) <- c("Jumlah_Kunjungan", "Frekuensi")
# --- Konversi kolom ke numerik ---
freq_data$Jumlah_Kunjungan <- as.numeric(as.character(freq_data$Jumlah_Kunjungan))
# --- Plot diagram batang + garis mean + garis distribusi (density) ---
ggplot(freq_data, aes(x = Jumlah_Kunjungan, y = Frekuensi)) +
geom_bar(stat = "identity", fill = "pink", color = "black", alpha = 0.8) +
geom_text(aes(label = Frekuensi), vjust = -0.5, size = 5, color = "darkred") +
geom_vline(xintercept = mean_value, color = "blue", linetype = "dashed", size = 1) +
# --- Garis distribusi ---
geom_line(
data = data.frame(
x = density(df$NumberOfVisits)$x,
y = density(df$NumberOfVisits)$y * max(freq_data$Frekuensi) / max(density(df$NumberOfVisits)$y)
),
aes(x = x, y = y),
color = "darkblue",
size = 1.2
) +
# --- Tambahkan teks mean ---
annotate("text",
x = mean_value + 0.3,
y = max(freq_data$Frekuensi) * 0.9,
label = paste("Mean =", round(mean_value, 2)),
color = "blue",
hjust = 0) +
labs(
title = "Diagram Batang Jumlah Kunjungan dengan Garis Distribusi & Nilai Mean",
x = "Jumlah Kunjungan",
y = "Frekuensi"
) +
theme_minimal(base_size = 14)- Interpretasi
Grafik di atas menunjukkan distribusi jumlah kunjungan (Number of Visits). Garis merah putus-putus menandakan nilai rata-rata (mean) dari seluruh data. Berdasarkan perhitungan, rata-rata jumlah kunjungan adalah sekitar 5 kali. Artinya, secara umum setiap orang dalam data ini melakukan kunjungan sekitar 5 kali.
3.7 Feedback Score
Data tersebut termasuk kedalam data numerik diskrit karena berupa angka
Mean mampu mewakili seluruh data ulasan dalam satu nilai yang mudah dipahami. Dengan mean, kita dapat menilai kecenderungan umum kepuasan pelanggan tanpa harus melihat setiap data satu per satu. Selain itu, mean efektif digunakan karena data ulasan berskala tetap (1–5), sehingga hasilnya dapat menggambarkan kualitas layanan secara objektif.
Rumus Mean :
\[
\bar{x} = \frac{\sum x_i}{n}
\] Keterangan:
\(x_i\) = nilai ke-i dari data
\(n\) = jumlah total data
- Menghitung Mean :
# Data ulasan pelanggan (skala 1–5)
f <- c(
1,5,2,1,5,3,1,2,2,2,5,5,4,3,3,1,3,3,1,4,4,3,3,3,1,5,4,3,5,1,5,1,1,3,5,5,3,5,4,4,
4,3,4,5,4,2,4,3,1,3,5,5,2,3,3,1,2,4,3,4,2,4,4,2,1,1,5,1,5,4,1,1,3,5,5,1,1,3,1,1,
3,3,2,1,5,4,5,2,5,1,3,1,1,5,2,1,2,3,1,5,2,2,1,3,1,4,1,4,2,3,3,5,1,3,1,4,3,1,3,1,
3,4,1,1,2,5,2,5,3,2,4,5,1,5,1,1,2,2,1,3,1,2,3,1,1,2,4,5,1,5,2,1,4,4,3,2,4,3,3,2,
1,4,5,5,5,5,2,2,5,3,1,2,2,5,5,1,1,2,2,2,3,3,5,1,3,2,5,1,3,2,4,1,1,3,2,1,2,3,2,4
)
# Menghitung rata-rata (mean)
mean_f <- mean(f)
# Menampilkan hasil
mean_f## [1] 2.8
- Visualisasi Histogram Feedback Score dengan Garis Distribusi & Mean
# Panggil library
library(ggplot2)
# Masukkan data feedback score
feedback_score <- c(
1,5,2,1,5,3,1,2,2,2,5,5,4,3,3,1,3,3,1,4,
4,3,3,3,1,5,4,3,5,1,5,1,1,3,5,5,3,5,4,4,
4,3,4,5,4,2,4,3,1,3,5,5,2,3,3,1,2,4,3,4,
2,4,4,2,1,1,5,1,5,4,1,1,3,5,5,1,1,3,1,1,
3,3,2,1,5,4,5,2,5,1,3,1,1,5,2,1,2,3,1,5,
2,2,1,3,1,4,1,4,2,3,3,5,1,3,1,4,3,1,3,1,
3,4,1,1,2,5,2,5,3,2,4,5,1,5,1,1,2,2,1,3,
1,2,3,1,1,2,4,5,1,5,2,1,4,4,3,2,4,3,3,2,
1,4,5,5,5,5,2,2,5,3,1,2,2,5,5,1,1,2,2,2,
3,3,5,1,3,2,5,1,3,2,4,1,1,3,2,1,2,3,2,4
)
# Ubah ke data frame
df <- data.frame(feedback_score)
# Hitung mean
mean_value <- mean(df$feedback_score)
# Hitung frekuensi tiap nilai
freq_data <- as.data.frame(table(df$feedback_score))
colnames(freq_data) <- c("Feedback", "Frekuensi")
# Plot diagram batang dengan label frekuensi dan garis mean
ggplot(freq_data, aes(x = Feedback, y = Frekuensi)) +
geom_bar(stat = "identity", fill = "pink", color = "black", alpha = 0.8) +
geom_text(aes(label = Frekuensi), vjust = -0.5, size = 5, color = "darkred") +
geom_vline(xintercept = mean_value, color = "blue", linetype = "dashed", size = 1) +
annotate("text",
x = mean_value + 0.1,
y = max(freq_data$Frekuensi) * 0.9,
label = paste("Mean =", round(mean_value, 2)),
color = "blue",
hjust = 0) +
labs(title = "Diagram Batang Ulasan Pelanggan dengan Nilai Mean",
x = "Skor Ulasan (1–5)",
y = "Frekuensi") +
theme_minimal(base_size = 14)- Interpretasi
Berdasarkan grafik histogram di atas, rata-rata skor feedback adalah 2.8. Hal ini menunjukkan bahwa tingkat kepuasan responden tergolong sedang–rendah, dengan mayoritas memberikan skor antara 2 dan 3. Hanya sebagian kecil responden yang memberikan nilai tinggi (4–5).