1 Introduction

Negara yang berlokasi di Asia Timur merupakan salah satu objek pariwisata yang terbaik, para pengunjung dapat berpergian ke Negara Taiwan, Jepang ataupun Korea Selatan. Sudah banyak juga aplikasi-aplikasi yang menawarkan rekomendasi tempat pariwisata salah satunya tripadvisor.com. Situs tersebut juga menerima feedback dan review dari para wisatawan yang berpergian ke Asia Timur, tempat-tempat wisata mencakup Bar, Park, Musium, Pantai, Taman, dll. Para Pengunjung juga dapat memberikan rating berpa Excellent (4), Very Good (3), Average (2), Poor (1), dan Terrible (0).

Berikut ini adalah data review dari pengunjung yang menggunakan situs tripadvisor.com yang sudah pernah berkunjung ke negara-negara Timur Asia. data tersebut dapat diolah menggunakan Machine Learning K-means Clustering, untuk mengetahui ada berapa kelas yang dapat diklasifikasikan oleh Machine Learning tersebut. Kita juga dapat mengambil informasi yanng ada dengan mengombinasikan PCA dan clustering dari data tersebut.

2 Memuat library

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(GGally)

## Loading required package: ggplot2

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

## 
## Attaching package: 'GGally'

## The following object is masked from 'package:dplyr':
## 
##     nasa

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(factoextra)

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(FactoMineR)
library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(funModeling)

## Loading required package: Hmisc

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## 
## Attaching package: 'Hmisc'

## The following object is masked from 'package:plotly':
## 
##     subplot

## The following objects are masked from 'package:dplyr':
## 
##     src, summarize

## The following objects are masked from 'package:base':
## 
##     format.pval, units

## funModeling v.1.9.3 :)
## Examples and tutorials at livebook.datascienceheroes.com
##  / Now in Spanish: librovivodecienciadedatos.ai

## 
## Attaching package: 'funModeling'

## The following object is masked from 'package:GGally':
## 
##     range01

library(cluster)

3 Memuat dataset

trip <- read.csv("tripadvisor_review.csv")
glimpse(trip)

## Observations: 980
## Variables: 11
## $ User.ID     <fct> User 1, User 2, User 3, User 4, User 5, User 6, User 7,...
## $ Category.1  <dbl> 0.93, 1.02, 1.22, 0.45, 0.51, 0.99, 0.90, 0.74, 1.12, 0...
## $ Category.2  <dbl> 1.80, 2.20, 0.80, 1.80, 1.20, 1.28, 1.36, 1.40, 1.76, 1...
## $ Category.3  <dbl> 2.29, 2.66, 0.54, 0.29, 1.18, 0.72, 0.26, 0.22, 1.04, 0...
## $ Category.4  <dbl> 0.62, 0.64, 0.53, 0.57, 0.57, 0.27, 0.32, 0.41, 0.64, 0...
## $ Category.5  <dbl> 0.80, 1.42, 0.24, 0.46, 1.54, 0.74, 0.86, 0.82, 0.82, 1...
## $ Category.6  <dbl> 2.42, 3.18, 1.54, 1.52, 2.02, 1.26, 1.58, 1.50, 2.14, 1...
## $ Category.7  <dbl> 3.19, 3.21, 3.18, 3.18, 3.18, 3.17, 3.17, 3.17, 3.18, 3...
## $ Category.8  <dbl> 2.79, 2.63, 2.80, 2.96, 2.78, 2.89, 2.66, 2.81, 2.79, 2...
## $ Category.9  <dbl> 1.82, 1.86, 1.31, 1.57, 1.18, 1.66, 1.22, 1.54, 1.41, 2...
## $ Category.10 <dbl> 2.42, 2.32, 2.50, 2.86, 2.54, 3.66, 3.22, 2.88, 2.54, 3...

Berikut adalah penjelasan pada setiap data: 1. User.ID : Unique user id 2. Category.1 : Average user feedback on art galleries 3. Category.2 : Average user feedback on dance clubs 4. Category.3 : Average user feedback on juice bars 5. Category.4 : Average user feedback on restaurants 6. Category.5 : Average user feedback on museums 7. Category.6 : Average user feedback on resorts 8. Category.7 : Average user feedback on parks/picnic spots 9. Category.8 : Average user feedback on beaches 10. Category.9 : Average user feedback on theaters 11. Category.10 : Average user feedback on religious institutions

4 Exploratory Data Analysis

Melihat struktur data

head(trip,1)

##   User.ID Category.1 Category.2 Category.3 Category.4 Category.5 Category.6
## 1  User 1       0.93        1.8       2.29       0.62        0.8       2.42
##   Category.7 Category.8 Category.9 Category.10
## 1       3.19       2.79       1.82        2.42

Mengecek missing value

colSums(is.na(trip))

##     User.ID  Category.1  Category.2  Category.3  Category.4  Category.5 
##           0           0           0           0           0           0 
##  Category.6  Category.7  Category.8  Category.9 Category.10 
##           0           0           0           0           0

Kesimpulan tidak ditemukan nilai missing value

Mengubah variable category menjadi nama yang sesuai dan menghapus variable user

trip <- trip %>% 
      select(- User.ID) %>% 
      rename(Art = Category.1,
             Club = Category.2,
             Bar = Category.3,
             Restaurant = Category.4,
             Museum = Category.5,
             Resort = Category.6,
             Park = Category.7,
             Beach = Category.8,
             Theatre = Category.9,
             Institutions = Category.10)
head(trip,2)

##    Art Club  Bar Restaurant Museum Resort Park Beach Theatre Institutions
## 1 0.93  1.8 2.29       0.62   0.80   2.42 3.19  2.79    1.82         2.42
## 2 1.02  2.2 2.66       0.64   1.42   3.18 3.21  2.63    1.86         2.32

5 Machine Learning

5.1 Visualisasi trip review dengan PCA dan FactoMineR

trip_pca <- prcomp(trip, scale. = T, center = T)
trip_pca

## Standard deviations (1, .., p=10):
##  [1] 1.7255447 1.1237411 1.1104500 1.0317317 1.0098612 0.8988780 0.8393180
##  [8] 0.6782247 0.5734843 0.3755038
## 
## Rotation (n x k) = (10 x 10):
##                      PC1         PC2          PC3         PC4         PC5
## Art          -0.01533238  0.31907527 -0.520473005  0.41411084 -0.31367748
## Club          0.12965685 -0.29722301  0.603205768  0.17896740 -0.03662148
## Bar           0.42881276  0.28305662 -0.046071730 -0.34153182  0.12952900
## Restaurant    0.22864226  0.01897704  0.078773787  0.74890429  0.02242080
## Museum        0.32181751 -0.34565736 -0.007916925 -0.21441024 -0.56644367
## Resort        0.41744888 -0.24905167 -0.161710035  0.02295544 -0.40737387
## Park          0.49573266  0.14264171 -0.089886330 -0.10745970  0.34490103
## Beach        -0.10481328 -0.41966084 -0.537430519 -0.11753217  0.16641040
## Theatre       0.03977775 -0.58584628 -0.179104270  0.18603647  0.42399390
## Institutions -0.45896330 -0.09046983  0.041478568 -0.11185528 -0.26410789
##                      PC6          PC7         PC8         PC9         PC10
## Art           0.50330680 -0.198420995 -0.03987369 -0.25060928 -0.009942147
## Club          0.34345200 -0.598133204 -0.05571957 -0.12293371 -0.060662580
## Bar           0.19608956 -0.114115540  0.52180303  0.12477140 -0.511013160
## Restaurant   -0.43167726  0.036949059  0.42824895  0.09221006 -0.013660185
## Museum       -0.11798417  0.216103985  0.23045888 -0.53691710  0.089717049
## Resort        0.05087168 -0.001439142 -0.29880387  0.69220114 -0.007935260
## Park          0.07970356 -0.133089959  0.04034761 -0.04217483  0.753555820
## Beach        -0.34631462 -0.588493780  0.08075941 -0.05336670 -0.083260622
## Theatre       0.46256416  0.415093811  0.12106882 -0.01512579 -0.087110340
## Institutions  0.21717814 -0.073718713  0.61190451  0.35458961  0.380027791

Melihat hasil dari biplot

biplot(trip_pca)

Melihat summary dari pca

summary(trip_pca)

## Importance of components:
##                           PC1    PC2    PC3    PC4    PC5    PC6     PC7    PC8
## Standard deviation     1.7255 1.1237 1.1105 1.0317 1.0099 0.8989 0.83932 0.6782
## Proportion of Variance 0.2978 0.1263 0.1233 0.1065 0.1020 0.0808 0.07045 0.0460
## Cumulative Proportion  0.2978 0.4240 0.5473 0.6538 0.7558 0.8366 0.90701 0.9530
##                            PC9   PC10
## Standard deviation     0.57348 0.3755
## Proportion of Variance 0.03289 0.0141
## Cumulative Proportion  0.98590 1.0000

Melihat korelasi antara variable

GGally::ggcorr(trip, hjust=0.7, label =T)

Hal yang bisa diambil dari korelasi adalah:

Park dan institutions memiliki nilai korelasi negatif yang paling kuat
Bar dan Park memiliki nilai korelasi positif yang paling tiggi
Observasi 642, 100, 331 merupakan nilai outlier
dari 7 data PC yang ada bisa mendapatkan nilai yang mencakup 90% dari info keseluruhan

5.2 Mencari nilai K Optimal

Dalam menentukan k pada cluster, ada beberapa pendekatan yang dapat digunakan yaitu : 1. Silhouette 2. Elbow 3. Gap Statistic

Pertama, kita akan coba menggunakan metode Elbow terlebih dahulu. Metode ini menggunakan pendekatan nilai within sum of square (wss) sebagai penentu k optimal.

1. Elbow Method

# determining optimal number of clustering, this process to compute called "Elbow method". 
set.seed(123)
fviz_nbclust(trip, kmeans, method = "wss")

2. Sillhouette Method

# or with Average Silhouette Method measures the quality of a clustering.
set.seed(123)
fviz_nbclust(trip, kmeans, method = "silhouette")

3. Gap Statistic

# or with Gap statistic method measures the quality of a clustering.
set.seed(123)
gap_stat <- clusGap(trip, FUN = kmeans, nstart = 25, K.max = 10, B = 123)

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

fviz_gap_stat(gap_stat)

Dapat disimpulkan berdasarkan dari metode elbow, gap statistic, sillhouette bahwa nilai K yang diambil bernilai 2

5.3 Cluster K-Means dengan K Optimal

# K-means Clustering and ploting 
set.seed(123)
trip_r <- kmeans(trip, centers = 2, nstart = 25)

trip_r

## K-means clustering with 2 clusters of sizes 598, 382
## 
## Cluster means:
##         Art     Club      Bar Restaurant    Museum   Resort     Park    Beach
## 1 0.8850167 1.309900 0.491990  0.5003679 0.7862542 1.625284 3.176973 2.854331
## 2 0.9059948 1.419476 1.829398  0.5828010 1.1800000 2.183560 3.187147 2.804895
##    Theatre Institutions
## 1 1.597124     2.925485
## 2 1.526099     2.601571
## 
## Clustering vector:
##   [1] 2 2 1 1 2 1 1 1 2 1 2 1 2 2 2 2 1 2 2 2 2 1 1 1 2 2 2 2 1 2 2 2 2 1 1 1 1
##  [38] 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 2 1 2 2 2 1 1 1 1 1 2 1 2 1 1 2 1 2
##  [75] 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1
## [112] 2 1 2 2 1 2 1 1 1 2 1 2 1 2 1 1 1 1 1 2 2 2 2 1 1 2 2 1 1 1 1 1 1 1 1 1 2
## [149] 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
## [186] 1 1 1 2 1 2 2 1 1 1 1 1 2 1 1 2 1 2 1 1 1 1 1 2 1 1 2 1 1 1 2 1 1 2 2 1 1
## [223] 2 1 2 1 1 2 1 2 1 2 1 1 1 1 2 1 2 1 2 2 1 1 1 1 1 2 1 1 2 1 2 2 1 1 1 1 1
## [260] 2 1 2 1 2 1 2 2 2 1 2 1 1 2 2 2 1 2 1 1 1 2 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1
## [297] 1 2 2 1 1 2 2 1 2 1 1 2 1 2 1 1 2 2 1 1 2 1 2 1 1 1 1 1 1 2 2 2 2 2 1 2 1
## [334] 1 1 2 2 1 1 1 2 2 2 2 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1
## [371] 1 2 2 2 1 2 2 1 1 1 1 1 1 2 1 2 2 1 1 1 2 1 2 1 2 2 2 2 1 2 1 2 1 1 2 2 2
## [408] 2 2 2 1 1 2 2 1 2 1 1 1 2 2 1 1 1 2 1 2 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1
## [445] 2 1 2 2 2 1 1 1 1 2 1 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 2 2 2 1 2 2 1
## [482] 2 1 2 2 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 2 2 2 1 1 1 1 2 2 1 2 1 1 2 1 2 1 2
## [519] 1 1 1 2 2 1 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1
## [556] 2 1 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1
## [593] 1 2 1 2 1 1 2 1 1 1 2 1 2 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2 1
## [630] 1 1 1 1 2 1 2 2 2 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 2 2 1 2 1 1 2 1 2 1 1 2
## [667] 2 2 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 2
## [704] 1 1 1 1 2 1 1 2 1 1 1 1 1 1 2 2 2 1 1 2 1 1 2 1 2 1 2 2 1 1 2 1 1 1 2 2 1
## [741] 2 2 1 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 1 2 1 2 2 1 1 1 1 2 1
## [778] 2 1 1 2 2 1 1 1 2 2 1 1 2 2 2 2 1 1 2 1 1 2 2 1 2 2 1 2 2 1 2 2 2 1 2 2 1
## [815] 2 2 2 1 2 2 2 1 1 2 1 1 2 1 2 2 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 2 1 1 2 1 1
## [852] 2 1 2 2 1 2 2 1 2 2 1 2 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1 2 2 2 1 1
## [889] 2 1 1 1 2 1 1 1 2 1 1 1 1 2 2 1 2 2 1 2 1 1 1 2 1 2 1 2 2 1 1 1 2 1 1 1 2
## [926] 2 2 1 1 1 1 2 1 1 1 2 1 2 2 2 2 1 2 1 1 1 1 1 1 2 1 2 2 2 2 2 2 1 1 1 2 1
## [963] 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2
## 
## Within cluster sum of squares by cluster:
## [1] 662.4259 517.3276
##  (between_SS / total_SS =  32.0 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Berdasarkan hasil clustering, sangat terbentuk dengan selisih yang cukup jauh yaitu cluster 1 : 598, cluster 2: 382.

trip_r$cluster

##   [1] 2 2 1 1 2 1 1 1 2 1 2 1 2 2 2 2 1 2 2 2 2 1 1 1 2 2 2 2 1 2 2 2 2 1 1 1 1
##  [38] 1 2 1 1 1 1 1 1 1 1 1 1 2 1 2 2 1 1 1 2 1 2 2 2 1 1 1 1 1 2 1 2 1 1 2 1 2
##  [75] 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1
## [112] 2 1 2 2 1 2 1 1 1 2 1 2 1 2 1 1 1 1 1 2 2 2 2 1 1 2 2 1 1 1 1 1 1 1 1 1 2
## [149] 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
## [186] 1 1 1 2 1 2 2 1 1 1 1 1 2 1 1 2 1 2 1 1 1 1 1 2 1 1 2 1 1 1 2 1 1 2 2 1 1
## [223] 2 1 2 1 1 2 1 2 1 2 1 1 1 1 2 1 2 1 2 2 1 1 1 1 1 2 1 1 2 1 2 2 1 1 1 1 1
## [260] 2 1 2 1 2 1 2 2 2 1 2 1 1 2 2 2 1 2 1 1 1 2 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1
## [297] 1 2 2 1 1 2 2 1 2 1 1 2 1 2 1 1 2 2 1 1 2 1 2 1 1 1 1 1 1 2 2 2 2 2 1 2 1
## [334] 1 1 2 2 1 1 1 2 2 2 2 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 2 2 1 1 2 1 1 2 1
## [371] 1 2 2 2 1 2 2 1 1 1 1 1 1 2 1 2 2 1 1 1 2 1 2 1 2 2 2 2 1 2 1 2 1 1 2 2 2
## [408] 2 2 2 1 1 2 2 1 2 1 1 1 2 2 1 1 1 2 1 2 1 1 1 2 2 2 2 2 1 1 1 1 1 1 1 1 1
## [445] 2 1 2 2 2 1 1 1 1 2 1 2 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 2 2 2 1 2 2 1
## [482] 2 1 2 2 1 1 1 1 1 2 1 1 2 1 1 1 2 2 1 2 2 2 1 1 1 1 2 2 1 2 1 1 2 1 2 1 2
## [519] 1 1 1 2 2 1 2 2 2 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1
## [556] 2 1 1 1 1 2 2 2 2 1 2 2 2 2 1 2 1 2 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 1
## [593] 1 2 1 2 1 1 2 1 1 1 2 1 2 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 2 1
## [630] 1 1 1 1 2 1 2 2 2 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 1 2 2 1 2 1 1 2 1 2 1 1 2
## [667] 2 2 1 1 1 2 1 1 1 1 1 2 1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 2
## [704] 1 1 1 1 2 1 1 2 1 1 1 1 1 1 2 2 2 1 1 2 1 1 2 1 2 1 2 2 1 1 2 1 1 1 2 2 1
## [741] 2 2 1 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1 2 1 2 1 1 1 2 1 1 2 1 2 2 1 1 1 1 2 1
## [778] 2 1 1 2 2 1 1 1 2 2 1 1 2 2 2 2 1 1 2 1 1 2 2 1 2 2 1 2 2 1 2 2 2 1 2 2 1
## [815] 2 2 2 1 2 2 2 1 1 2 1 1 2 1 2 2 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 2 1 1 2 1 1
## [852] 2 1 2 2 1 2 2 1 2 2 1 2 1 1 1 2 2 1 1 1 1 2 1 1 1 2 1 1 2 1 1 1 2 2 2 1 1
## [889] 2 1 1 1 2 1 1 1 2 1 1 1 1 2 2 1 2 2 1 2 1 1 1 2 1 2 1 2 2 1 1 1 2 1 1 1 2
## [926] 2 2 1 1 1 1 2 1 1 1 2 1 2 2 2 2 1 2 1 1 1 1 1 1 2 1 2 2 2 2 2 2 1 1 1 2 1
## [963] 1 2 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2

trip_r$centers

##         Art     Club      Bar Restaurant    Museum   Resort     Park    Beach
## 1 0.8850167 1.309900 0.491990  0.5003679 0.7862542 1.625284 3.176973 2.854331
## 2 0.9059948 1.419476 1.829398  0.5828010 1.1800000 2.183560 3.187147 2.804895
##    Theatre Institutions
## 1 1.597124     2.925485
## 2 1.526099     2.601571

trip_r$size

## [1] 598 382

trip_r$tot.withinss

## [1] 1179.753

fviz_cluster(trip_r, data = trip, ggtheme = theme_bw(), main = "Clustering plot")

trip_1 <- trip %>% 
  mutate(Cluster = trip_r$cluster) %>% 
  group_by(Cluster) %>%
  summarise_at(1:10, "sd")

trip_1

## # A tibble: 2 x 11
##   Cluster   Art  Club   Bar Restaurant Museum Resort    Park Beach Theatre
##     <int> <dbl> <dbl> <dbl>      <dbl>  <dbl>  <dbl>   <dbl> <dbl>   <dbl>
## 1       1 0.339 0.453 0.352      0.259  0.363  0.451 0.00524 0.141   0.390
## 2       2 0.306 0.509 0.556      0.303  0.437  0.489 0.00713 0.126   0.316
## # ... with 1 more variable: Institutions <dbl>

Pada cluster 2 merupakan cluster yang memiliki Art, Beach, dan Institution tinggi. Sedangkan pada Cluster 1 yang merupakan cluster yang memiliki Club, Bar, Restaurant, Museum, Resort, Park, Theatre nilai lebih tinggi.

6 Kesimpulan

Berdasarkan analisis unsupervised learning yang dilakukan, kita dapat mengambil kesimpulan:

K-means clustering dapat dilakukan pada dataset tersebut, K-means clustering dapat membagi data menjadi 2 cluster dengan 32% nilai total sum of squares berasal dari jarak observasi antara cluster.
2 cluster tersebut lebih didominasi oleh PC 1.

Unsupervised Learning: Tripadvisor Review Analysis

Alif Aziz

2/17/2020