Persiapan

#install.packages("mongolite")
#install.packages(c("hrbrthemes", "ggtext", "ggrepel"))
library(mongolite)
library(dplyr)
library(ggplot2)
library(plotly)
library(hrbrthemes)
library(ggtext)      # untuk teks berformat
library(ggrepel)
library(showtext)
library(scales)
library(stringr)

Menghubungkan data di MongoAtlas ke R

# Contoh koneksi
trip <- mongo(collection = "Data Scrapping",
             db = "ProjectMDS_UAS",
             url = "mongodb+srv://sitinurazizaah07:cTf3ZqhushiEBPJk@cluster0.sxdrib4.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0")

Kode di atas digunakan untuk membuat objek koneksi trip yang menghubungkan R dengan koleksi “Data Scrapping” pada database “ProjectMDS_UAS” di MongoDB Atlas. Dengan koneksi ini, kita dapat langsung melakukan query atau agregasi data dari database tersebut melalui objek trip.

Fungsi untuk mengganti nama kolom di MongoDB

#trip$update(query = '{}', update = '{"$rename": {"Harga_Permalam (US$)": "Harga_Permalam_dolar"}}', multiple = TRUE)

Perintah ini digunakan untuk mengganti nama field “Harga_Permalam (US$)” menjadi “Harga_Permalam_dolar” pada seluruh dokumen di koleksi MongoDB. Proses ini hanya perlu dilakukan satu kali, karena setelah dijalankan, perubahan nama field akan tersimpan permanen di database.

Menampilkan 5 data pertama menggunakan sintaks MongoDB

trip$find(limit = 5)
##                                       Hotel Rating   Penilaian Jumlah_Ulasan
## 1               18 Suite Villa Loft at Kuta    8.5  Luar Biasa            59
## 2 Abisena Wellness & Resort Ubud-Adult Only    9.3 Sangat Baik             5
## 3                              Adepa Resort    8.5  Luar Biasa            20
## 4                             Adiwana Bisma    9.2 Sangat Baik            60
## 5                         Alaya Dedaun Kuta    9.5 Menakjubkan            80
##                          Ulasan1                        Ulasan2
## 1 "Pengalaman menginap terbaik!"       "Pemilik properti ramah"
## 2                           <NA>                           <NA>
## 3                           <NA>                           <NA>
## 4       "Pemilik properti ramah" "Pengalaman menginap terbaik!"
## 5       "Pemilik properti ramah" "Pengalaman menginap terbaik!"
##      Destinasi_Terdekat Lokasi                                    Tipe_Kamar
## 1           Pantai Kuta   Kuta                                  Kamar Deluxe
## 2 COMO Shambhala Estate   Ubud                                  Suite Sungai
## 3 Finns Recreation Club Dalung         Twin - One Bedroom Private Pool Villa
## 4           Ubud Palace   Ubud                            Adiwana Rice Field
## 5           Pantai Kuta   Kuta Vila Deluxe 1 Kamar Tidur dengan Kolam Renang
##   Harga_Permalam_dolar
## 1                   75
## 2                  217
## 3                  162
## 4                  343
## 5                  501

Selanjutnya, dilakukan pengecekan data untuk melihat distribusi data yang ada. Proses ini mencakup perhitungan jumlah data dan nilai missing (NA) pada setiap kolom.

Jika menggunakan fungsi agregasi MongoDB melalui paket mongolite, perhitungan nilai missing menjadi cukup rumit karena MongoDB tidak secara native mengenal konsep NA seperti di R, melainkan hanya memiliki nilai null atau field yang tidak ada (missing field). Penghitungan missing secara manual di MongoDB bisa dilakukan dengan pipeline agregasi, tapi cukup merepotkan, terutama jika tidak menggunakan mongoshell secara langsung.

Oleh karena itu, secara praktis perhitungan jumlah data dan missing value lebih mudah dilakukan langsung di R menggunakan fungsi sapply setelah data diambil dari MongoDB ke dalam data frame.

Mengubah data menjadi data frame

data <- trip$find()
trip_df <- as.data.frame(data)

Mengubah tipe data frame

trip_df[] <- lapply(trip_df, function(x) {
  if (is.character(x)) {
    as.factor(x)
  } else {
    x
  }
})

# Cek data frame hasil modifikasi
str(trip_df)
## 'data.frame':    197 obs. of  10 variables:
##  $ Hotel               : Factor w/ 192 levels "18 Suite Villa Loft at Kuta",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Rating              : num  8.5 9.3 8.5 9.2 9.5 8.2 9 9.3 9.5 9.3 ...
##  $ Penilaian           : Factor w/ 5 levels "Baik","Fantastis",..: 3 5 3 5 4 3 5 5 4 5 ...
##  $ Jumlah_Ulasan       : num  59 5 20 60 80 55 82 50 662 177 ...
##  $ Ulasan1             : Factor w/ 14 levels "\"Bersih dan rapi\"",..: 13 NA NA 12 12 13 13 13 2 8 ...
##  $ Ulasan2             : Factor w/ 22 levels "\"Bersih dan rapi\"",..: 20 NA NA 21 21 NA 20 20 22 22 ...
##  $ Destinasi_Terdekat  : Factor w/ 48 levels "Alas Harum Bali",..: 31 13 16 46 31 33 34 2 31 16 ...
##  $ Lokasi              : Factor w/ 17 levels "Blahbatu","Candidasa",..: 6 16 4 16 6 12 13 17 6 13 ...
##  $ Tipe_Kamar          : Factor w/ 159 levels "1 Bedroom Sensual Hanging Tent (Adult Only)",..: 44 120 134 5 141 24 17 159 40 41 ...
##  $ Harga_Permalam_dolar: int  75 217 162 343 501 37 232 2623 88 74 ...

Kode tersebut mengubah semua kolom bertipe karakter (character) dalam data frame trip_df menjadi faktor (factor). Ini berguna untuk memudahkan analisis kategori, seperti pada modeling statistik atau visualisasi yang membutuhkan variabel kategorikal. Pemeriksaan struktur data kemudian dilakukan dengan mengggunakan fungsi str().

Menghitung jumlah data tiap kolom pada data frame dengan fungsi R

data_summary <- data.frame(
  Kolom = names(trip_df),
  Total_Data = nrow(trip_df),
  Non_NA = sapply(trip_df, function(x) sum(!is.na(x))),
  Jumlah_NA = sapply(trip_df, function(x) sum(is.na(x))),
  Persen_NA = round(sapply(trip_df, function(x) mean(is.na(x)) * 100), 2)
)
print(data_summary)
##                                     Kolom Total_Data Non_NA Jumlah_NA Persen_NA
## Hotel                               Hotel        197    197         0      0.00
## Rating                             Rating        197    197         0      0.00
## Penilaian                       Penilaian        197    197         0      0.00
## Jumlah_Ulasan               Jumlah_Ulasan        197    196         1      0.51
## Ulasan1                           Ulasan1        197    170        27     13.71
## Ulasan2                           Ulasan2        197    166        31     15.74
## Destinasi_Terdekat     Destinasi_Terdekat        197    197         0      0.00
## Lokasi                             Lokasi        197    197         0      0.00
## Tipe_Kamar                     Tipe_Kamar        197    197         0      0.00
## Harga_Permalam_dolar Harga_Permalam_dolar        197    197         0      0.00

Mengaktifkan jenis huruf untuk digunakan pada grafik

Sintaks dibawah ini digunakan untuk mengaktifkan font Poppins dari Google Fonts yang akan digunakan dalam visualisasi R. Fungsi font_add_google() mengunduh dan menambahkan font tersebut ke lingkungan R, sedangkan showtext_auto() mengaktifkan penggunaan font ini otomatis pada plot yang dibuat dengan ggplot2 atau grafik base R.

# Aktifkan font Google
font_add_google("Poppins", "poppins")
showtext_auto()

Berikut penjelasan singkat mengenai skrip visualisasi data Tripadvisor yang digunakan, yang terbagi menjadi dua bagian:

  1. Koding R murni
    Seluruh proses mulai dari pembersihan, pengelompokan, hingga visualisasi dilakukan langsung di R dengan paket seperti dplyr, stringr, dan ggplot2. Data yang digunakan berupa data frame lokal, misalnya trip_df.

  2. Koding agregasi MongoDB
    Pada bagian ini, proses agregasi seperti grouping, sorting, dan limiting dilakukan langsung di MongoDB menggunakan pipeline agregasi (aggregate()), yang dijalankan pada koleksi database MongoDB, misalnya trip. Visualisasi kemudian dibuat dengan fungsi ggplot2 di R menggunakan data yang secara langsung diambil dari database trip.

Walaupun metode agregasi berbeda, visualisasi tetap menggunakan fungsi ggplot2 di R, sehingga tampilan grafik serupa, dengan perbedaan bahwa agregasi datanya dapat dilakukan di R secara lokal atau langsung di MongoDB.

Bar Chart

R – Jumlah ulasan terbanyak (Top 10 Hotel)

# Visualisasi Basic
trip_df %>%
  distinct(Hotel, .keep_all = TRUE) %>%
  arrange(desc(Jumlah_Ulasan)) %>%
  slice_head(n = 10) %>%
  ggplot(aes(x = reorder(Hotel, Jumlah_Ulasan), y = Jumlah_Ulasan)) +
  geom_col(fill = "blue") +
  geom_text(aes(y = Jumlah_Ulasan + 25, label = Jumlah_Ulasan), vjust = 0, size = 3) +
  coord_flip() +
  labs(
    title = "Top 10 Hotel dengan Jumlah Ulasan Terbanyak",
    x = "",
    y = "Jumlah Ulasan"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(size = 14, face = "bold", margin = margin(b = 25)),
    plot.margin = margin(t = 10, r = 10, b = 10, l = 10)  # margin sekeliling plot
  )

# Visualisasi Interaktif
trip_df %>%
  distinct(Hotel, .keep_all = TRUE) %>%
  arrange(desc(Jumlah_Ulasan)) %>%
  slice_head(n = 10) %>%
  ggplot(aes(x = reorder(Hotel, Jumlah_Ulasan), y = Jumlah_Ulasan, fill = Jumlah_Ulasan)) +
  geom_col(show.legend = FALSE, width = 0.7) +
  geom_text(aes(label = paste0(Jumlah_Ulasan)),
            hjust = -0.1,
            size = 4,
            color = "#444444",
            fontface = "bold") +
  coord_flip() +
  labs(
    title = "<span style='color:#E76F51;'>Top 10 Hotel</span> dengan <span style='color:#2A9D8F;'>Jumlah Ulasan Terbanyak</span>",
    x="",
    y = "Jumlah Ulasan"
  ) +
  scale_fill_gradient(low = "#F4A261", high = "#E76F51") +
  theme_minimal(base_family = "poppins", base_size = 14) +
  theme(
    plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 15)),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(color = "#264653"),
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f0efeb", color = NA),
    panel.background = element_rect(fill = "#f0efeb", color = NA)
  ) +
  ylim(0, max(trip_df$Jumlah_Ulasan, na.rm = TRUE) * 1.15)

MongoDB–Jumlah ulasan terbanyak (Top 10 Hotel)

pipeline_ulasan <- '[
  {"$group": {"_id": "$Hotel", "Jumlah_Ulasan": {"$max": "$Jumlah_Ulasan"}}},
  {"$sort": {"Jumlah_Ulasan": -1}},
  {"$limit": 10},
  {"$project": {"Hotel": "$_id", "Jumlah_Ulasan": 1, "_id": 0}}
]'
top10_ulasan <- trip$aggregate(pipeline_ulasan)

# Visualisasi Basic
ggplot(top10_ulasan, aes(x = reorder(Hotel, Jumlah_Ulasan), y = Jumlah_Ulasan)) +
  geom_col(fill = "blue") +
  geom_text(aes(y = Jumlah_Ulasan + 20, label = Jumlah_Ulasan), vjust = 0, size = 3) +
  coord_flip() +
  labs(
    title = "Top 10 Hotel dengan Jumlah Ulasan Terbanyak",
    x = "",
    y = "Jumlah Ulasan"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(size = 14, face = "bold", margin = margin(b = 25)),
    plot.margin = margin(t = 10, r = 10, b = 10, l = 10)
  )

# Visualisasi Interaktif
ggplot(top10_ulasan,aes(x = reorder(Hotel, Jumlah_Ulasan), y = Jumlah_Ulasan, fill = Jumlah_Ulasan)) +
  geom_col(show.legend = FALSE, width = 0.7) +
  geom_text(aes(label = paste0(Jumlah_Ulasan)),
            hjust = -0.1,
            size = 4,
            color = "#444444",
            fontface = "bold") +
  coord_flip() +
  labs(
    title = "<span style='color:#E76F51;'>Top 10 Hotel</span> dengan <span style='color:#2A9D8F;'>Jumlah Ulasan Terbanyak</span>",
    x="",
    y = "Jumlah Ulasan"
  ) +
  scale_fill_gradient(low = "#F4A261", high = "#E76F51") +
  theme_minimal(base_family = "poppins", base_size = 14) +
  theme(
    plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 15)),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(color = "#264653"),
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f0efeb", color = NA),
    panel.background = element_rect(fill = "#f0efeb", color = NA)
  ) +
  ylim(0, max(trip_df$Jumlah_Ulasan, na.rm = TRUE) * 1.15)

MongoDB–Top 10 hotel dengan rating tertinggi

pipeline_top10_rating <- '[
{ "$match": { "Rating": { "$ne": null } } },
  { "$sort": { "Rating": -1, "Hotel": 1 } },
  { "$limit": 10 },
  { "$project": { "Hotel": 1, "Rating": 1, "Jumlah_Ulasan": 1 } }
]'
top10_rating <- trip$aggregate(pipeline_top10_rating)

# Visualisasi Basic
ggplot(top10_rating, aes(x = reorder(Hotel, Rating), y = Rating, fill = Rating)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = round(Rating, 1)), hjust = -0.1, color = "black", size = 4) +
  coord_flip() +
  labs(title = "Top 10 Hotel dengan Rating Tertinggi",
       x = "Nama Hotel", y = "Rating") +
  theme_minimal(base_family = "poppins")

MongoDB–Top 5 Hotel dengan rating tertinggi

pipeline_top_rating <- '[
  {
    "$group": {
      "_id": "$Rating",
      "Jumlah_Hotel": { "$sum": 1 },
      "Hotel_List": { "$push": "$Hotel" }
    }
  },
  {
    "$project": {
      "Rating": "$_id",
      "Jumlah_Hotel": 1,
      "Hotel_List": {
        "$reduce": {
          "input": "$Hotel_List",
          "initialValue": "",
          "in": {
            "$cond": [
              { "$eq": ["$$value", ""] },
              "$$this",
              { "$concat": ["$$value", "; ", "$$this"] }
            ]
          }
        }
      }
    }
  },
  { "$sort": { "Rating": -1 } },
  { "$limit": 5 }
]'

# Mengambil data dari Mongo
top_rating5 <- trip$aggregate(pipeline_top_rating)

# Tambahkan ranking dan atur maksimal panjang teks label
top_rating5 <- top_rating5 %>%
  mutate(Rank = row_number(),
         Hotel_List = str_wrap(Hotel_List, width = 70))

# Visualisasi menggunakan Plotly
fig <- plot_ly(
  data = top_rating5,
  y = ~factor(Rank),
  x = ~Jumlah_Hotel,
  type = 'bar',
  text = ~Hotel_List,
  textposition = 'none',
  orientation = 'h',
  marker = list(color = '#219ebc'),
  hoverinfo = 'text+x'
) %>%
  add_text(
    x = ~Jumlah_Hotel + max(top_rating5$Jumlah_Hotel)*0.05,  # posisi sedikit di kanan bar
    y = ~factor(Rank),
    text = ~paste("Rating:", Rating),
    showlegend = FALSE,
    textposition = "middle",
    textfont = list(color = "black", size = 8)
  ) %>%
  layout(
    title = "Top 5 Rating Hotel (Gabungan Hotel)",
    yaxis = list(title = "Peringkat Rating"),
    xaxis = list(title = "Jumlah Hotel"),
    margin = list(l = 200)
  )

fig
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

R–Distribusi penilaian hotel

# Visualisasi Basic
trip_df %>%
  group_by(Penilaian) %>%
  summarise(Jumlah = n()) %>%
  mutate(Penilaian = factor(Penilaian, levels = c("Baik", "Luar Biasa", "Sangat Baik",
                                                  "Menakjubkan", "Fantastis"))) %>%
  ggplot(aes(x = Penilaian, y = Jumlah)) +
  geom_col(fill = "coral") +
  geom_text(aes(label = Jumlah), hjust = -0.1, size = 4, color = "black") +
  coord_flip() +
  labs(title = "Distribusi Penilaian Hotel", x = "Penilaian", y = "Jumlah Hotel") +
  theme_minimal()

# Visualisasi Interaktif
trip_df %>%
  group_by(Penilaian) %>%
  summarise(Jumlah = n()) %>%
  mutate(Penilaian = factor(Penilaian, levels = c("Baik", "Luar Biasa", "Sangat Baik",
                                                  "Menakjubkan", "Fantastis"))) %>%
  ggplot(aes(x = Penilaian, y = Jumlah, fill = Penilaian)) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(aes(label = paste0(Jumlah)), vjust = -0.5, size = 5, family = "poppins", fontface = "bold", color = "#333333") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "<span style='color:#E76F51;'>Distribusi</span> Penilaian Hotel",
    x = "Penilaian",
    y = "Jumlah Hotel"
  ) +
  theme_minimal(base_family = "poppins", base_size = 14) +
  theme(
    plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 10)),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(color = "#264653"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#fefae0", color = NA),
    panel.background = element_rect(fill = "#fefae0", color = NA)
  ) +
  ylim(0, max(table(trip_df$Penilaian)) * 1.2)

MongoDB–Distribusi penilaian hotel

pipeline_penilaian <- '[
  {"$group": {"_id": "$Penilaian", "Jumlah": {"$sum": 1}}},
  {"$sort": {"Jumlah": -1}},
  {"$project": {"Penilaian": "$_id", "Jumlah": 1, "_id": 0}}
]'
dist_penilaian <- trip$aggregate(pipeline_penilaian)
dist_penilaian <- dist_penilaian[, c("Penilaian", "Jumlah")]

# Perbaiki nama kolom dan faktor urutan
colnames(dist_penilaian)[1] <- "Penilaian"
dist_penilaian$Penilaian <- factor(dist_penilaian$Penilaian,
    levels = c("Baik", "Luar Biasa", "Sangat Baik", "Menakjubkan", "Fantastis"))

# Visualisasi Basic
ggplot(dist_penilaian, aes(x = reorder(Penilaian, Jumlah), y = Jumlah, fill = Jumlah)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = Jumlah), vjust = -0.3, size = 3) +
  labs(title = "Distribusi Penilaian Hotel", x = "Penilaian", y = "Jumlah Hotel") +
  theme_minimal() +
  coord_flip()

# Visualisasi Interaktif
ggplot(dist_penilaian, aes(x = Penilaian, y = Jumlah, fill = Penilaian)) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(aes(label = Jumlah), vjust = -0.5, size = 5, family = "poppins", fontface = "bold", color = "#333333") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "<span style='color:#E76F51;'>Distribusi</span> Penilaian Hotel",
    x = "Penilaian",
    y = "Jumlah Hotel"
  ) +
  theme_minimal(base_family = "poppins", base_size = 14) +
  theme(
    plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 10)),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(color = "#264653"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#fefae0", color = NA),
    panel.background = element_rect(fill = "#fefae0", color = NA)
  ) +
  ylim(0, max(dist_penilaian$Jumlah) * 1.2)

R–Rating berdasarkan lokasi (agregat: rata-rata)

# Visualisasi Basic
trip_df %>%
  group_by(Lokasi) %>%
  summarise(Rata_Rating = mean(Rating, na.rm = TRUE)) %>%
  ggplot(aes(x = reorder(Lokasi, Rata_Rating), y = Rata_Rating)) +
  geom_col(fill = "lightgreen") +
  coord_flip() +
  geom_text(aes(label = paste0(round(Rata_Rating, 1))), vjust = -0.3, size = 3) +
  labs(title = "Rata-rata Rating per Lokasi", x = "Lokasi", y = "Rata-rata Rating")

# Visualisasi Interaktif
trip_df %>%
  group_by(Lokasi) %>%
  summarise(Rata_Rating = mean(Rating, na.rm = TRUE)) %>%
  ggplot(aes(x = reorder(Lokasi, Rata_Rating), y = Rata_Rating, fill = Rata_Rating)) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(
    aes(label = paste0(round(Rata_Rating, 1))),
    hjust = -0.1,
    family = "poppins",
    size = 5,
    color = "#333333",
    fontface = "bold"
  ) +
  scale_fill_gradient(low = "#A8DADC", high = "blue") +
  coord_flip() +
  labs(
    title = "<span style='color:#1D3557;'>Rata-rata Rating</span> per Lokasi️",
    x = "Lokasi",
    y = "Rata-rata Rating"
  ) +
  theme_minimal(base_family = "poppins", base_size = 14) +
  theme(
    plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 10)),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(color = "#264653"),
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f1faee", color = NA),
    panel.background = element_rect(fill = "#f1faee", color = NA)
  ) +
  ylim(0, max(trip_df$Rating, na.rm = TRUE) + 0.5)

MongoDB–Rating berdasarkan lokasi (agregat: rata-rata)

pipeline_rating_lokasi <- '[
  { "$group": {
      "_id": "$Lokasi",
      "Rata2_Rating": { "$avg": { "$toDouble": "$Rating" } }
  }},
  { "$sort": { "Rata2_Rating": -1 } }
]'
rating_lokasi <- trip$aggregate(pipeline_rating_lokasi)
colnames(rating_lokasi)[1] <- "Lokasi"

# Visualisasi Basic
ggplot(rating_lokasi, aes(x = reorder(Lokasi, Rata2_Rating), y = Rata2_Rating)) +
  geom_col(fill = "seagreen") +
  coord_flip() +
  geom_text(aes(label = paste0(round(Rata2_Rating, 1))), vjust = -0.3, size = 3) +
  labs(title = "Rata-rata Rating per Lokasi", x = "Lokasi", y = "Rata-rata Rating") +
  theme_minimal()

# Visualisasi Interaktif
  ggplot(rating_lokasi, aes(x = reorder(Lokasi, Rata2_Rating), y = Rata2_Rating, fill = Rata2_Rating)) +
  geom_col(width = 0.6, show.legend = FALSE) +
  geom_text(
    aes(label = round(Rata2_Rating, 1)),
    hjust = -0.1,
    family = "poppins",
    size = 5,
    color = "#333333",
    fontface = "bold"
  ) +
  scale_fill_gradient(low = "#A8DADC", high = "blue") +
  coord_flip() +
  labs(
    title = "<span style='color:#1D3557;'>Rata-rata Rating</span> per Lokasi️",
    x = "Lokasi",
    y = "Rata-rata Rating"
  ) +
  theme_minimal(base_family = "poppins", base_size = 14) +
  theme(
    plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 10)),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(color = "#264653"),
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f1faee", color = NA),
    panel.background = element_rect(fill = "#f1faee", color = NA)
  ) +
  ylim(0, max(rating_lokasi$Rata2_Rating) + 0.5)

R–Top 5 destinasi terdekat

# Visualisasi Basic
trip_df %>%
  group_by(Destinasi_Terdekat) %>%
  summarise(Jumlah = n()) %>%
  slice_max(order_by = Jumlah, n = 5) %>%
  ggplot(aes(x = reorder(Destinasi_Terdekat, Jumlah), y = Jumlah)) +
  geom_col(fill = "purple") +
  geom_text(aes(label = Jumlah), hjust = -0.1, size = 4, color = "black") +
  coord_flip() +
  labs(title = "Top 5 Destinasi Terdekat", x = "Destinasi", y = "Jumlah Hotel") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

# Visualisasi Interaktif
trip_df %>%
  group_by(Destinasi_Terdekat) %>%
  summarise(Jumlah = n()) %>%
  slice_max(order_by = Jumlah, n = 5) %>%
  ggplot(aes(x = reorder(Destinasi_Terdekat, -Jumlah), y = Jumlah, fill = Jumlah)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = Jumlah),
            vjust = -0.5,
            family = "poppins",
            size = 4.5,
            color = "#333333") +
  scale_fill_gradient(low = "#CDB4DB", high = "#5E548E") +
  labs(
    title = "Top 5 Destinasi Terdekat",
    x = "Destinasi",
    y = "Jumlah Hotel"
  ) +
  theme_minimal(base_family = "poppins", base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 18, color = "#2A2A2A"),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 10, color = "#444444"),
    axis.text.y = element_text(color = "#444444"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f8f9fa", color = NA),
    panel.background = element_rect(fill = "#f8f9fa", color = NA)
  )

MongoDB–Top 5 destinasi terdekat

pipeline_destinasi <- '[
  { "$group": {
      "_id": "$Destinasi_Terdekat",
      "Jumlah": { "$sum": 1 }
  }},
  { "$sort": { "Jumlah": -1 } },
  { "$limit": 5 } 
]'
destinasi <- trip$aggregate(pipeline_destinasi)
colnames(destinasi)[1] <- "Destinasi_Terdekat"

# Visualisasi Basic
ggplot(destinasi, aes(x = reorder(Destinasi_Terdekat, Jumlah), y = Jumlah)) +
  geom_col(fill = "purple") +
  coord_flip() +
  labs(title = "Top 5 Destinasi Terdekat Berdasarkan Jumlah Hotel", x = "Destinasi Terdekat", y = "Jumlah") +
  theme_minimal()

# Visualisasi Interaktif
ggplot(destinasi, aes(x = reorder(Destinasi_Terdekat, -Jumlah), y = Jumlah, fill = Jumlah)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = Jumlah),
            vjust = -0.5,
            family = "poppins",
            size = 4.5,
            color = "#333333") +
  scale_fill_gradient(low = "#CDB4DB", high = "#5E548E") +
  labs(
    title = "Top 5 Destinasi Terdekat Berdasarkan Jumlah Hotel",
    x = "Destinasi",
    y = "Jumlah Hotel"
  ) +
  theme_minimal(base_family = "poppins", base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 18, color = "#2A2A2A"),
    axis.text.x = element_text(angle = 45, hjust = 1, size = 10, color = "#444444"),
    axis.text.y = element_text(color = "#444444"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f8f9fa", color = NA),
    panel.background = element_rect(fill = "#f8f9fa", color = NA)
  )

R–Jumlah hotel per lokasi (unik berdasarkan nama hotel)

# Visualisasi Basic
trip_df %>%
  distinct(Hotel, Lokasi) %>%
  group_by(Lokasi) %>%
  summarise(Jumlah_Hotel = n()) %>%
  ggplot(aes(x = reorder(Lokasi, Jumlah_Hotel), y = Jumlah_Hotel)) +
  geom_col(fill = "orange") +
  coord_flip() +
  geom_text(aes(label = Jumlah_Hotel), vjust = -0.3, size = 3) +
  labs(title = "Jumlah Hotel per Lokasi", x = "Lokasi", y = "Jumlah Hotel")

# Simpan hasil agregasi
lokasi_df <- trip_df %>%
  distinct(Hotel, Lokasi) %>%
  count(Lokasi, name = "Jumlah_Hotel")

# Visualisasi Interaktif
ggplot(lokasi_df, aes(x = reorder(Lokasi, Jumlah_Hotel), y = Jumlah_Hotel, fill = Jumlah_Hotel)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = Jumlah_Hotel), hjust = -0.2, color = "#333333", size = 4.5, fontface = "bold") +
  coord_flip() +
  scale_fill_gradient(low = "#F4A261", high = "#E76F51") +
  labs(
    title = "Jumlah Hotel per Lokasi",
    x = "Lokasi",
    y = "Jumlah Hotel"
  ) +
  theme_minimal(base_family = "poppins") +
  theme(
    plot.title = element_text(size = 18, face = "bold", color = "#2A2A2A"),
    axis.text = element_text(color = "#444444"),
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#FAF9F6", color = NA),
    panel.background = element_rect(fill = "#FAF9F6", color = NA)
  ) +
  ylim(0, max(lokasi_df$Jumlah_Hotel) * 1.15)

MongoDB–Jumlah hotel per lokasi (unik berdasarkan nama hotel)

pipeline_jml_hotel_lokasi <- '[
  { "$group": {
      "_id": "$Lokasi",
      "Jumlah_Hotel": { "$sum": 1 }
  }},
  { "$sort": { "Jumlah_Hotel": -1 } }
]'
jml_hotel_lokasi <- trip$aggregate(pipeline_jml_hotel_lokasi)
colnames(jml_hotel_lokasi)[1] <- "Lokasi"

ggplot(jml_hotel_lokasi, aes(x = reorder(Lokasi, Jumlah_Hotel), y = Jumlah_Hotel)) +
  geom_col(fill = "darkred") +
  geom_text(aes(label = Jumlah_Hotel), hjust = -0.1, size = 4, color = "black") +
  coord_flip() +
  labs(
    title = "Jumlah Hotel per Lokasi",
    x = "Lokasi",
    y = "Jumlah Hotel"
  ) +
  theme_minimal() +
  expand_limits(y = max(jml_hotel_lokasi$Jumlah_Hotel) * 1.1)  # memberi ruang untuk label

# Visualisasi Interaktif
ggplot(jml_hotel_lokasi, aes(x = reorder(Lokasi, Jumlah_Hotel), y = Jumlah_Hotel, fill = Jumlah_Hotel)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = Jumlah_Hotel), hjust = -0.2, color = "#333333", size = 4.5,
            fontface = "bold") +
  coord_flip() +
  scale_fill_gradient(low = "#F4A261", high = "#E76F51") +
  labs(
    title = "Jumlah Hotel per Lokasi",
    x = "Lokasi",
    y = "Jumlah Hotel"
  ) +
  theme_minimal(base_family = "poppins") +
  theme(
    plot.title = element_text(size = 18, face = "bold", color = "#2A2A2A"),
    axis.text = element_text(color = "#444444"),
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#FAF9F6", color = NA),
    panel.background = element_rect(fill = "#FAF9F6", color = NA)
  ) +
  ylim(0, max(lokasi_df$Jumlah_Hotel) * 1.15)

R–Jumlah hotel berdasarkan penilaian

# Visualisasi Basic
trip_df %>%
  distinct(Hotel, Penilaian) %>%
  group_by(Penilaian) %>%
  mutate(Penilaian = factor(Penilaian, levels = c("Baik", "Luar Biasa", "Sangat Baik",
                                                  "Menakjubkan", "Fantastis"))) %>%
  summarise(Jumlah_Hotel = n()) %>%
  ggplot(aes(x = Penilaian, y = Jumlah_Hotel)) +
  geom_col(fill = "darkgreen") +
  geom_text(aes(label = Jumlah_Hotel), hjust = -0.1, size = 4, color = "black") +
  coord_flip() +
  labs(title = "Jumlah Hotel Berdasarkan Penilaian", x = "Penilaian", y = "Jumlah Hotel")

# Visualisasi Interaktif
trip_df %>%
  distinct(Hotel, Penilaian) %>%
  group_by(Penilaian) %>%
  mutate(Penilaian = factor(Penilaian, levels = c("Baik", "Luar Biasa", "Sangat Baik",
                                                  "Menakjubkan", "Fantastis"))) %>%
  summarise(Jumlah_Hotel = n()) %>%
  ggplot(aes(x = Penilaian, y = Jumlah_Hotel, fill = Penilaian)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = Jumlah_Hotel), vjust = -0.5, fontface = "bold", color = "#333333", size = 5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Jumlah Hotel Berdasarkan Penilaian",
    x = "Penilaian",
    y = "Jumlah Hotel"
  ) +
  theme_minimal(base_family = "poppins") +
  theme(
    plot.title = element_text(size = 18, face = "bold", color = "#264653"),
    axis.text = element_text(color = "#2A2A2A"),
    panel.grid.major.y = element_blank(),
    plot.background = element_rect(fill = "#FAF9F6", color = NA),
    panel.background = element_rect(fill = "#FAF9F6", color = NA)
  ) +
  ylim(0, NA)

MongoDB–Jumlah hotel berdasarkan penilaian

pipeline_jml_hotel_penilaian <- '[
  {"$group": {"_id": "$Penilaian","Jumlah_Hotel": { "$sum": 1 }}},
  {"$project": {"_id": 0,"Penilaian": "$_id","Jumlah_Hotel": 1}},
  {"$sort": { "Jumlah_Hotel": -1 }}]'
jml_hotel_penilaian <- trip$aggregate(pipeline_jml_hotel_penilaian)


# Perbaiki nama kolom dan faktor urutan
jml_hotel_penilaian$Penilaian <- factor(jml_hotel_penilaian$Penilaian,
    levels = c("Baik", "Luar Biasa", "Sangat Baik", "Menakjubkan", "Fantastis"))

#Visualisasi Basic
ggplot(jml_hotel_penilaian, aes(x = Penilaian, y = Jumlah_Hotel)) +
  geom_col(fill = "darkblue") +
  coord_flip() +
  geom_text(aes(label = Jumlah_Hotel), hjust = -0.1, size = 4, color = "black") +
  labs(title = "Jumlah Hotel berdasarkan Penilaian", x = "Penilaian", y = "Jumlah Hotel") +
  theme_minimal()

# Visualisasi Interaktif
ggplot(jml_hotel_penilaian, aes(x = Penilaian, y = Jumlah_Hotel, fill = Penilaian)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = Jumlah_Hotel), vjust = -0.5, fontface = "bold", color = "#333333", size = 5) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Jumlah Hotel Berdasarkan Penilaian",
    x = "Penilaian",
    y = "Jumlah Hotel"
  ) +
  theme_minimal(base_family = "poppins") +
  theme(
    plot.title = element_text(size = 18, face = "bold", color = "#264653"),
    axis.text = element_text(color = "#2A2A2A"),
    panel.grid.major.y = element_blank(),
    plot.background = element_rect(fill = "#FAF9F6", color = NA),
    panel.background = element_rect(fill = "#FAF9F6", color = NA)
  ) +
  ylim(0, max(jml_hotel_penilaian$Jumlah_Hotel) * 1.1)

Boxplot

MongoDB–Sebaran harga hotel berdasarkan penilaian

pipeline_boxplot <- '[
  { "$match": { "Harga_Permalam_dolar": { "$ne": null }, "Penilaian": { "$ne": null } } },
  { "$project": { "Harga_Permalam_dolar": 1, "Penilaian": 1 } }
]'

boxplot_hargavsnilai <- trip$aggregate(pipeline_boxplot)

# Cek data
str(boxplot_hargavsnilai)
## 'data.frame':    197 obs. of  3 variables:
##  $ _id                 : chr  "68356ca693602cd26ed78abc" "68356ca693602cd26ed78abd" "68356ca693602cd26ed78abe" "68356ca693602cd26ed78abf" ...
##  $ Penilaian           : chr  "Luar Biasa" "Sangat Baik" "Luar Biasa" "Sangat Baik" ...
##  $ Harga_Permalam_dolar: int  75 217 162 343 501 37 232 2623 88 74 ...
head(boxplot_hargavsnilai)
##                        _id   Penilaian Harga_Permalam_dolar
## 1 68356ca693602cd26ed78abc  Luar Biasa                   75
## 2 68356ca693602cd26ed78abd Sangat Baik                  217
## 3 68356ca693602cd26ed78abe  Luar Biasa                  162
## 4 68356ca693602cd26ed78abf Sangat Baik                  343
## 5 68356ca693602cd26ed78ac0 Menakjubkan                  501
## 6 68356ca693602cd26ed78ac1  Luar Biasa                   37
# Pastikan kolom Rating numeric
boxplot_hargavsnilai$Harga_Permalam_dolar <- as.numeric(boxplot_hargavsnilai$Harga_Permalam_dolar)

# Perbaiki nama kolom dan faktor urutan
boxplot_hargavsnilai$Penilaian <- factor(boxplot_hargavsnilai$Penilaian,
    levels = c("Baik", "Luar Biasa", "Sangat Baik", "Menakjubkan", "Fantastis"))

# Visualisasi Basic
ggplot(boxplot_hargavsnilai, aes(x = Harga_Permalam_dolar, y = Penilaian, fill = Penilaian)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Distribusi Harga per Malam berdasarkan Penilaian Hotel",
       x = "Harga permalam (dalam dolar)", y = "Penilaian") +
  theme_minimal() +
  theme(legend.position = "none")

# Visualisasi + Violin
ggplot(boxplot_hargavsnilai, aes(x = reorder(Penilaian, Harga_Permalam_dolar, FUN = median), y = Harga_Permalam_dolar)) +
  geom_violin(fill = "red", alpha = 0.7) +
  geom_boxplot(width = 0.1, fill = "white", outlier.color = NA) +
  coord_flip() +
  labs(
    title = "Distribusi Harga per Malam Berdasarkan Penilaian Hotel",
    x = "Penilaian",
    y = "Harga per Malam (USD)"
  ) +
  theme_minimal(base_size = 14)

MongoDB- Agregasi Lokasi vs Harga permalam

# Query MongoDB: Ambil lokasi dan harga per malam, filter harga tidak null
pipeline_harga_lokasi <- '[
  { "$match": { "Harga_Permalam_dolar": { "$ne": null } } },
  { "$project": { "Lokasi": 1, "Harga_Permalam_dolar": 1, "_id": 0 } }
]'
harga_lokasi <- trip$aggregate(pipeline_harga_lokasi)

# Visualisasi Boxplot Harga per Lokasi di R
ggplot(harga_lokasi, aes(x = reorder(Lokasi, Harga_Permalam_dolar, FUN = median), y = Harga_Permalam_dolar)) +
  geom_boxplot(fill = "#69b3a2", alpha = 0.7) +
  coord_flip() +
  labs(
    title = "Distribusi Harga per Malam Berdasarkan Lokasi",
    x = "Lokasi",
    y = "Harga per Malam (USD)"
  ) +
  theme_minimal(base_size = 14)

Scatter Plot

R–Harga per malam vs Rating

# Visualisasi Basic
ggplot(trip_df, aes(x = Harga_Permalam_dolar, y = Rating)) +
  geom_point(alpha = 0.6, color = "blue") +
  labs(title = "Harga per Malam vs Rating", x = "Harga per Malam (US$)", y = "Rating")

#Visualisasi Interaktif
ggplot(trip_df, aes(x = Harga_Permalam_dolar, y = Rating)) +
  geom_point(aes(color = Rating), alpha = 0.7, size = 3) +
  geom_smooth(method = "loess", se = FALSE, color = "#264653", linetype = "dashed") +
  scale_color_gradient(low = "#90CAF9", high = "#1E88E5") +
  labs(
    title = "Hubungan antara Harga per Malam dan Rating Hotel",
    x = "Harga per Malam (US$)",
    y = "Rating"
  ) +
  theme_minimal(base_family = "poppins") +
  theme(
    plot.title = element_text(size = 18, face = "bold", color = "#2A2A2A"),
    axis.text = element_text(color = "#444444"),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f8f9fa", color = NA),
    panel.background = element_rect(fill = "#f8f9fa", color = NA)
  )
## `geom_smooth()` using formula = 'y ~ x'

MongoDB–Scatterplot: Harga per malam vs Rating

# Ambil data Harga dan Rating
data_hrg_rating <- trip$find(fields = '{"Harga_Permalam_dolar":1, "Rating":1, "_id":0}')

# Ubah ke numeric dan filter NA
data_hrg_rating <- data_hrg_rating %>%
  mutate(
    Harga = as.numeric(Harga_Permalam_dolar),
    Rating = as.numeric(Rating)
  ) %>%
  filter(!is.na(Harga) & !is.na(Rating))

# Visualisasi Basic
ggplot(data_hrg_rating, aes(x = Harga, y = Rating)) +
  geom_point(alpha = 0.6, color = "blue") +
  labs(title = "Harga per Malam vs Rating", x = "Harga per Malam (US$)", y = "Rating")

# Visualisasi Interaktif
ggplot(data_hrg_rating, aes(x = Harga, y = Rating)) +
  geom_point(aes(color = Rating), alpha = 0.7, size = 3) +
  geom_smooth(method = "loess", se = FALSE, color = "#264653", linetype = "dashed") +
  scale_color_gradient(low = "#90CAF9", high = "#1E88E5") +
  labs(
    title = "Hubungan antara Harga per Malam dan Rating Hotel",
    x = "Harga per Malam (US$)",
    y = "Rating"
  ) +
  theme_minimal(base_family = "poppins") +
  theme(
    plot.title = element_text(size = 18, face = "bold", color = "#2A2A2A"),
    axis.text = element_text(color = "#444444"),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f8f9fa", color = NA),
    panel.background = element_rect(fill = "#f8f9fa", color = NA)
  )
## `geom_smooth()` using formula = 'y ~ x'

R–Rating vs Jumlah Ulasan

trip_df <- trip_df %>%
  mutate(
    Rating = as.numeric(Rating),
    Jumlah_Ulasan = as.numeric(Jumlah_Ulasan)
  ) %>%
  filter(!is.na(Rating) & !is.na(Jumlah_Ulasan))

# Visualisasi Basic
ggplot(trip_df, aes(x = Rating, y = Jumlah_Ulasan)) +
  geom_point(alpha = 0.6, color = "darkred") +
  labs(title = "Rating vs Jumlah Ulasan", x = "Rating", y = "Jumlah Ulasan")

#Visualisasi Interaktif
ggplot(trip_df, aes(x = Rating, y = Jumlah_Ulasan)) +
  geom_point(aes(color = Rating, size = Jumlah_Ulasan), alpha = 0.7) +
  geom_smooth(method = "loess", se = FALSE, color = "#264653", linetype = "dashed") +
  scale_color_gradient(low = "#F4A261", high = "#E76F51") +
  labs(
    title = "Hubungan antara Rating dan Jumlah Ulasan Hotel",
    x = "Rating",
    y = "Jumlah Ulasan"
  ) +
  theme_minimal(base_family = "poppins") +
  theme(
    plot.title = element_text(size = 18, face = "bold", color = "#333333"),
    axis.text = element_text(color = "#444444"),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f9f9f9", color = NA),
    panel.background = element_rect(fill = "#f9f9f9", color = NA)
  )
## `geom_smooth()` using formula = 'y ~ x'

MongoDB–Rating vs Jumlah Ulasan

data_rating_ulasan <- trip$find(fields = '{"Rating":1, "Jumlah_Ulasan":1, "_id":0}')

data_rating_ulasan <- data_rating_ulasan %>%
  mutate(
    Rating = as.numeric(Rating),
    Jumlah_Ulasan = as.numeric(Jumlah_Ulasan)
  ) %>%
  filter(!is.na(Rating) & !is.na(Jumlah_Ulasan))

#Visualisasi Basic
ggplot(data_rating_ulasan, aes(x = Rating, y = Jumlah_Ulasan)) +
  geom_point(color = "coral") +
  labs(title = "Rating Hotel vs Jumlah Ulasan", x = "Rating", y = "Jumlah Ulasan") +
  theme_minimal()

#Visualisasi Interaktif
ggplot(data_rating_ulasan, aes(x = Rating, y = Jumlah_Ulasan)) +
  geom_point(aes(color = Rating, size = Jumlah_Ulasan), alpha = 0.7) +
  geom_smooth(method = "loess", se = FALSE, color = "#264653", linetype = "dashed") +
  scale_color_gradient(low = "#F4A261", high = "#E76F51") +
  labs(
    title = "Rating Hotel vs Jumlah Ulasan",
    x = "Rating",
    y = "Jumlah Ulasan"
  ) +
  theme_minimal(base_family = "poppins") +
  theme(
    plot.title = element_text(size = 18, face = "bold", color = "#333333"),
    axis.text = element_text(color = "#444444"),
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f9f9f9", color = NA),
    panel.background = element_rect(fill = "#f9f9f9", color = NA)
  )
## `geom_smooth()` using formula = 'y ~ x'

MongoDB–Plot antara harga, rating, dan jumlah ulasan

pipeline_korelasi <- '[
  { "$match": {
      "Harga_Permalam_dolar": { "$ne": null },
      "Rating": { "$ne": null },
      "Jumlah_Ulasan": { "$ne": null }
  }},
  { "$project": {
      "Harga_Permalam_dolar": 1,
      "Rating": 1,
      "Jumlah_Ulasan": 1
  }}
]'
data_korelasi <- trip$aggregate(pipeline_korelasi)

# Visualisasi
ggplot(data_korelasi, aes(x = Rating, y = Harga_Permalam_dolar, size = Jumlah_Ulasan)) +
  geom_point(alpha = 0.7, color = "#2a9d8f") +
  scale_size(range = c(2, 10)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Harga vs Rating dan Jumlah Ulasan",
       x = "Rating", y = "Harga per Malam (dalam dolar)", size = "Jumlah Ulasan") +
  theme_minimal(base_family = "poppins")