The objective of this model is to create a clustering of hostel in Japan based on several parameters including:
This model also can be used as recommendation system, for example when someone wants to find similar hotel in different location
The data is obtained from this link https://www.kaggle.com/koki25ando/hostel-world-dataset
Add several libraries
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
## X hostel.name City price.from
## 1 1 "Bike & Bed" CharinCo Hostel Osaka 3300
## 2 2 & And Hostel Fukuoka-City 2600
## 3 3 &And Hostel Akihabara Tokyo 3600
## 4 4 &And Hostel Ueno Tokyo 2600
## 5 5 &And Hostel-Asakusa North- Tokyo 1500
## 6 6 1night1980hostel Tokyo Tokyo 2100
## Distance.from.city.centre summary.score rating.band atmosphere cleanliness
## 1 2.9 9.2 Superb 8.9 9.4
## 2 0.7 9.5 Superb 9.4 9.7
## 3 7.8 8.7 Fabulous 8.0 7.0
## 4 8.7 7.4 Very Good 8.0 7.5
## 5 10.0 9.4 Superb 9.5 9.5
## 6 9.4 7.0 Very Good 5.5 8.0
## facilities location.y security staff valueformoney lon lat
## 1 9.3 8.9 9.0 9.4 9.4 135.5138 34.68268
## 2 9.5 9.7 9.2 9.7 9.5 NA NA
## 3 9.0 8.0 10.0 10.0 9.0 139.7775 35.69745
## 4 7.5 7.5 7.0 8.0 6.5 139.7837 35.71272
## 5 9.0 9.0 9.5 10.0 9.5 139.7984 35.72790
## 6 6.0 6.0 8.5 8.5 6.5 139.7869 35.72438
## X hostel.name City
## 0 0 0
## price.from Distance.from.city.centre summary.score
## 0 0 15
## rating.band atmosphere cleanliness
## 15 15 15
## facilities location.y security
## 15 15 15
## staff valueformoney lon
## 15 15 44
## lat
## 44
According to the WSS Chart, the choosen number of cluster is 7
## X hostel.name City price.from Distance.from.city.centre
## 1 1 "Bike & Bed" CharinCo Hostel Osaka 3300 2.9
## 2 3 &And Hostel Akihabara Tokyo 3600 7.8
## 3 4 &And Hostel Ueno Tokyo 2600 8.7
## 4 5 &And Hostel-Asakusa North- Tokyo 1500 10.0
## 5 6 1night1980hostel Tokyo Tokyo 2100 9.4
## 6 7 328 Hostel & Lounge Tokyo 3300 16.0
## summary.score rating.band atmosphere cleanliness facilities location.y
## 1 9.2 Superb 8.9 9.4 9.3 8.9
## 2 8.7 Fabulous 8.0 7.0 9.0 8.0
## 3 7.4 Very Good 8.0 7.5 7.5 7.5
## 4 9.4 Superb 9.5 9.5 9.0 9.0
## 5 7.0 Very Good 5.5 8.0 6.0 6.0
## 6 9.3 Superb 8.7 9.7 9.3 9.1
## security staff valueformoney lon lat cluster
## 1 9.0 9.4 9.4 135.5138 34.68268 7
## 2 10.0 10.0 9.0 139.7775 35.69745 4
## 3 7.0 8.0 6.5 139.7837 35.71272 1
## 4 9.5 10.0 9.5 139.7984 35.72790 4
## 5 8.5 8.5 6.5 139.7869 35.72438 1
## 6 9.3 9.7 8.9 139.7455 35.54804 4
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
Hotel Cluster in Fukuoka
Hotel Cluster in Hiroshima
Hotel Cluster in Kyoto
Hotel Cluster in Osaka
Hotel Cluster in Tokyo
db_cluster <- db_cluster %>%
mutate(cluster = as.factor(cluster))
ggplot(db_cluster, aes(x = cluster, y = mean(price.from)))+
geom_col(aes(fill = cluster)) ### Rating Distribution Based on Cluster
rating <- db_cluster %>%
select(c(summary.score, atmosphere, cleanliness, facilities,location.y,security,staff, valueformoney, cluster))%>%
group_by(cluster)%>%
summarise_all(mean)rating %>%
pivot_longer(cols = -cluster, names_to = "type", values_to = "value") %>%
ggplot(aes(x=as.factor(cluster), y =value)) +
geom_col(aes(fill = cluster))+
facet_wrap(~type)Seven cluster are formed using K-NN method, each cluster has each own characteristic in terms of ratings, price, also location. This model could be used as recommendation system, for example a person usually stays at Hostel cluster 1 in Tokyo, when he wants to stay in Osaka the algorithm will recommend the Hostel cluster 1 that located in Osaka as that hostel has similar characteristic with the one that he has stayed before