集群分析

背景與目標 (Introduction) 本專案旨在透過數據分析遊戲玩家的行為日誌 (Game_Log)，利用 K-Means 將玩家進行分群。

目標：

資料整併：將原始日誌轉換為玩家特徵表 (User Profile)。集群分析：依據「消費金額」與「遊玩時間」將玩家分類。商業洞察：識別不同價值的客群（如 VIP、主力玩家、無課族），以利制定差異化行銷策略

資料準備與特徵工程 (Data Preparation) 首先讀取原始資料，並進行特徵工程。我們將資料從「日誌層級」聚合為「使用者層級」，並計算關鍵指標

game_log <- read_csv("Game_Log.csv")

## Rows: 4600 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): User_Id
## dbl (6): Min_Aft, Min_Eve, Min_Mid, Buy_Coin, Buy_Dia, Buy_Car
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

user_table <- read_csv("User_Table_Cluster.csv")

## Rows: 920 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): User_Id, Identity, Telecom
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# 1. 資料聚合 & 特徵萃取
user_behavior <- game_log %>%
  group_by(User_Id) %>%
  summarise(
    user_fre = n(),
    money_coin = sum(Buy_Coin),
    money_dia = sum(Buy_Dia),
    money_car = sum(Buy_Car),
    total_money = money_coin + money_dia + money_car,
    Total_Time = sum(Min_Aft + Min_Eve + Min_Mid)
  )

# 2. 合併使用者個人資料 (Merging)
final_data <- user_behavior %>%
  left_join(user_table, by = "User_Id")

# 檢視前幾筆資料
head(final_data) %>% knitr::kable()

User_Id	user_fre	money_coin	money_dia	money_car	total_money	Total_Time	Identity	Telecom
021ViplNqr	5	884	140	0	1024	524	Normal	other
03DaAFruM2	5	1324	100	40	1464	503	Novice	other
0AqrYZIKVB	5	1144	100	20	1264	493	Novice	ABC
0Cw2x0rNm9	5	34	0	65	99	418	Veteran	ABC
0JhnZ3cyxP	5	913	100	16	1029	511	Normal	other
0QtymWgMEn	5	0	0	0	0	401	Novice	ABC

資料標準化 (Standardization) 由於 total_money (消費金額) 與 Total_Time (分鐘數) 的單位與數值範圍差異極大，為了避免大數值變數主導距離計算，使用 Z-Score 進行標準化。在探索性資料分析 (EDA) 階段，我們發現 user_fre (登入頻率) 在此資料集中變異數為 0（所有玩家登入次數相同），因此將其從分群模型中剔除，僅使用「金額」與「時間」作為分群依據。

# 選取關鍵變數並進行標準化 (Z-Score)
scaled_data <- final_data %>%
  select(total_money, Total_Time) %>% # 排除 user_fre
  scale()

head(scaled_data)

##       total_money Total_Time
## [1,] -0.244719854 -0.1362642
## [2,]  0.193422168 -0.3662481
## [3,] -0.005733296 -0.4757642
## [4,] -1.165813877 -1.2971351
## [5,] -0.239740967 -0.2786352
## [6,] -1.264395832 -1.4833125

尋找最佳群數 (Elbow Method)我們使用手肘法 (Elbow Method) 來決定最佳的K值。我們測試 K=1到 K=10，觀察組內誤差平方和 (Total Within-Cluster Sum of Squares) 的下降趨勢。

set.seed(123)

# 計算 K=1~10 的誤差值
k_results <- tibble(k = 1:10) %>%
  mutate(
    model = map(k, ~ kmeans(scaled_data, centers = .x, nstart = 25)),
    tot_withinss = map_dbl(model, ~ .x$tot.withinss)
  )

# 繪製手肘圖
ggplot(k_results, aes(x = k, y = tot_withinss)) +
  geom_line() +
  geom_point(size = 2) +
  scale_x_continuous(breaks = 1:10) +
  labs(
    title = "手肘法分析 (Elbow Method)",
    x = "群數 (k)",
    y = "組內誤差總和 (Total Within SS)"
  ) +
  theme_minimal()

曲線在 K=3 處出現明顯的轉折（手肘點），之後誤差下降幅度趨緩。因此，我們選擇將玩家分為 3 群。

建立 K-Means 模型 (Modeling) 正式執行 K-Means 分群演算法，並將分群標籤貼回原始資料表。

set.seed(123)

# 建立模型 (K=3)
final_kmeans <- kmeans(scaled_data, centers = 3, nstart = 25)

# 將分群結果 (1, 2, 3) 轉為因子並加入資料表
final_data <- final_data %>%
  mutate(cluster = as.factor(final_kmeans$cluster))

# 查看各群人數分布
table(final_data$cluster)

## 
##   1   2   3 
## 162 202 556

商業洞察與人物誌 (Profiling) 為了理解每一群的特徵，我們計算各群的平均消費與平均遊玩時間。

cluster_profile <- final_data %>%
  group_by(cluster) %>%
  summarise(
    Count = n(),
    Avg_Money = mean(total_money),
    Avg_Time = mean(Total_Time)
  )

cluster_profile %>% knitr::kable()

cluster	Count	Avg_Money	Avg_Time
1	162	3114.31481	678.4630
2	202	51.52475	429.3614
3	556	1174.91007	533.9658

群體特徵解讀： Cluster 1 (忠誠 VIP)：特徵：平均消費最高 (High Money)，遊玩時間也最長 (High Time)。策略：人數雖少但貢獻度極高。建議提供尊榮服務、專屬客服或限量商品以維持黏著度。

Cluster 2 (無課族)：特徵：平均消費極低 (Low Money)，但保有一定的遊玩時間。策略：作為遊戲生態的基石。建議透過看廣告換獎勵的方式變現，或維持伺服器熱度。

Cluster 3 (核心玩家/中產階級)：特徵：人數最多，消費能力與遊玩時間皆介於中間。策略：遊戲營收的穩定來源。建議推廣高 CP 值的月卡或首儲優惠，刺激轉化為更高消費群體。

視覺化呈現 (Visualization)

ggplot(final_data, aes(x = Total_Time, y = total_money, color = cluster)) +
  geom_point(alpha = 0.6, size = 2) +
  labs(
    title = "玩家分群結果：時間 vs. 金錢",
    subtitle = "K-Means Clustering (K=3)",
    x = "總遊玩時間 (Total Time)",
    y = "總消費金額 (Total Money)",
    color = "群集 (Cluster)"
  ) +
  theme_minimal() +
  scale_color_brewer(palette = "Set1")

8. 結論 (Conclusion) 透過本次集群分析，我們成功將雜亂的玩家行為數據轉化為可執行的商業情報。

確認了遊戲中存在明顯的「消費 M 型化」現象。

透過數據驅動的方式，準確定義了 VIP 與潛力股的邊界。

此模型可作為未來 CRM 系統的自動化標籤基礎。

集群分析

Phil Kao

2026-01-04