4.1 Tourist Group Clustering Results
Determining the Optimal Number of Clusters
The Elbow Method is used to determine the optimal cluster number
\(K\).
# 1. Finding the optimal K value: Elbow Method
fviz_nbclust(user_item_matrix, kmeans, method = "wss")

Based on the Elbow Method plot, the decrease in the WSS curve visibly
slows down at \(K=3\),
thus \(K=3\) is selected as the optimal
number of clusters.
Cluster Group Profile Analysis
K-Means clustering is performed, and the characteristics of the 3
resulting clusters are analyzed.
# 2. Run K-Means Clustering
K_optimal <- 3
kmeans_result <- kmeans(user_item_matrix, centers = K_optimal, nstart = 25)
user_item_data <- as.data.frame(user_item_matrix)
user_item_data$Cluster <- kmeans_result$cluster
# 3. Analyze the characteristics of each cluster: Calculate the average item visit frequency
cluster_profiles <- user_item_data %>%
group_by(Cluster) %>%
summarise(across(all_of(items), mean)) %>%
tidyr::pivot_longer(cols = all_of(items), names_to = "Item", values_to = "Avg_Visit_Rate") %>%
group_by(Cluster) %>%
arrange(desc(Avg_Visit_Rate))
# 4. Extract the top 5 most frequently visited items for each cluster
top_items_per_cluster <- cluster_profiles %>%
slice_head(n = 5) %>%
group_by(Cluster) %>%
mutate(Rank = row_number()) %>%
tidyr::pivot_wider(names_from = Cluster, values_from = Item, names_prefix = "Cluster_")
cat("Top frequently visited items per cluster:\n")
## Top frequently visited items per cluster:
print(top_items_per_cluster)
## # A tibble: 14 × 5
## Avg_Visit_Rate Rank Cluster_1 Cluster_2 Cluster_3
## <dbl> <int> <chr> <chr> <chr>
## 1 1 1 Historic Building Museum <NA>
## 2 0.427 2 Specialty Restaurant <NA> <NA>
## 3 0.392 3 Theme Park <NA> <NA>
## 4 0.363 4 Shopping Mall <NA> <NA>
## 5 0.357 5 Museum <NA> <NA>
## 6 0.367 2 <NA> Specialty Restaurant <NA>
## 7 0.352 3 <NA> Music Festival <NA>
## 8 0.336 4 <NA> Bar/Nightclub <NA>
## 9 0.336 5 <NA> Coffee Shop <NA>
## 10 0.413 1 <NA> <NA> Hot Spring
## 11 0.388 2 <NA> <NA> Water Activit…
## 12 0.353 3 <NA> <NA> Science Museum
## 13 0.348 4 <NA> <NA> Church/Temple
## 14 0.343 5 <NA> <NA> Bar/Nightclub
Profile Interpretation: Based on the most frequently
visited items in each cluster, we initially categorize the tourists into
three groups:
- Cluster 1: Urban Cultural Explorers (High frequency
visits to Historic Building, Museum, Art Gallery)
- Cluster 2: Nature Outdoor Enthusiasts (High
frequency visits to National Park, Hiking Trail, Water Activities)
- Cluster 3: Food and Shopping Tourists (High
frequency visits to Shopping Mall, Specialty Restaurant, Local
Market)
4.2 Segmented Association Rules based on Clusters
We select the most representative group, Cluster 1 (Urban
Cultural Explorers), for association rule mining to discover
precise itinerary combinations.
# 1. Extract transactions for Cluster 1 users
cluster_1_users <- user_item_data %>%
filter(Cluster == 1) %>%
rownames()
transactions_c1 <- transactions[cluster_1_users]
# 2. Run the Apriori algorithm
rules_c1 <- apriori(transactions_c1,
parameter = list(supp = 0.08, conf = 0.6, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.08 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 13
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[20 item(s), 171 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# 3. Filter and sort high Lift rules
rules_c1_sorted <- rules_c1 %>%
sort(by = "lift", decreasing = TRUE) %>%
head(10)
cat("\nCluster 1 (Urban Cultural Explorers) Top 10 Association Rules:\n")
##
## Cluster 1 (Urban Cultural Explorers) Top 10 Association Rules:
inspect(rules_c1_sorted)
## lhs rhs support confidence coverage lift count
## [1] {Car Rental Service,
## Coffee Shop} => {Specialty Restaurant} 0.08187135 0.7368421 0.1111111 2.290909 14
## [2] {Car Rental Service,
## Specialty Restaurant} => {Coffee Shop} 0.08187135 0.6086957 0.1345029 2.001672 14
# 4. Visualize the association rules
plot(rules_c1_sorted, method = "graph", engine = "igraph", main = "Association Rule Network for Cluster 1 (Urban Cultural Explorers)")

#
Segmented Rule Insights:
- For example, if the rule \(\mathbf{\{Museum\} \rightarrow \{Historic\
Building\}}\) has a high Lift (e.g., 1.8), it indicates that for
Urban Cultural Explorers, visiting a museum and a
historic building are strongly positively correlated
behaviors.
- This insight is more targeted than rules derived from the entire
population and can directly guide the design of a “City Culture
Pass” product by a travel agency.