# Contoh koneksi
trip <- mongo(collection = "Data Scrapping",
db = "ProjectMDS_UAS",
url = "mongodb+srv://sitinurazizaah07:cTf3ZqhushiEBPJk@cluster0.sxdrib4.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0")Kode di atas digunakan untuk membuat objek koneksi trip yang menghubungkan R dengan koleksi “Data Scrapping” pada database “ProjectMDS_UAS” di MongoDB Atlas. Dengan koneksi ini, kita dapat langsung melakukan query atau agregasi data dari database tersebut melalui objek trip.
#trip$update(query = '{}', update = '{"$rename": {"Harga_Permalam (US$)": "Harga_Permalam_dolar"}}', multiple = TRUE)Perintah ini digunakan untuk mengganti nama field “Harga_Permalam (US$)” menjadi “Harga_Permalam_dolar” pada seluruh dokumen di koleksi MongoDB. Proses ini hanya perlu dilakukan satu kali, karena setelah dijalankan, perubahan nama field akan tersimpan permanen di database.
## Hotel Rating Penilaian Jumlah_Ulasan
## 1 18 Suite Villa Loft at Kuta 8.5 Luar Biasa 59
## 2 Abisena Wellness & Resort Ubud-Adult Only 9.3 Sangat Baik 5
## 3 Adepa Resort 8.5 Luar Biasa 20
## 4 Adiwana Bisma 9.2 Sangat Baik 60
## 5 Alaya Dedaun Kuta 9.5 Menakjubkan 80
## Ulasan1 Ulasan2
## 1 "Pengalaman menginap terbaik!" "Pemilik properti ramah"
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 "Pemilik properti ramah" "Pengalaman menginap terbaik!"
## 5 "Pemilik properti ramah" "Pengalaman menginap terbaik!"
## Destinasi_Terdekat Lokasi Tipe_Kamar
## 1 Pantai Kuta Kuta Kamar Deluxe
## 2 COMO Shambhala Estate Ubud Suite Sungai
## 3 Finns Recreation Club Dalung Twin - One Bedroom Private Pool Villa
## 4 Ubud Palace Ubud Adiwana Rice Field
## 5 Pantai Kuta Kuta Vila Deluxe 1 Kamar Tidur dengan Kolam Renang
## Harga_Permalam_dolar
## 1 75
## 2 217
## 3 162
## 4 343
## 5 501
Selanjutnya, dilakukan pengecekan data untuk melihat distribusi data yang ada. Proses ini mencakup perhitungan jumlah data dan nilai missing (NA) pada setiap kolom.
Jika menggunakan fungsi agregasi MongoDB melalui paket mongolite, perhitungan nilai missing menjadi cukup rumit karena MongoDB tidak secara native mengenal konsep NA seperti di R, melainkan hanya memiliki nilai null atau field yang tidak ada (missing field). Penghitungan missing secara manual di MongoDB bisa dilakukan dengan pipeline agregasi, tapi cukup merepotkan, terutama jika tidak menggunakan mongoshell secara langsung.
Oleh karena itu, secara praktis perhitungan jumlah data dan missing value lebih mudah dilakukan langsung di R menggunakan fungsi sapply setelah data diambil dari MongoDB ke dalam data frame.
trip_df[] <- lapply(trip_df, function(x) {
if (is.character(x)) {
as.factor(x)
} else {
x
}
})
# Cek data frame hasil modifikasi
str(trip_df)## 'data.frame': 197 obs. of 10 variables:
## $ Hotel : Factor w/ 192 levels "18 Suite Villa Loft at Kuta",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Rating : num 8.5 9.3 8.5 9.2 9.5 8.2 9 9.3 9.5 9.3 ...
## $ Penilaian : Factor w/ 5 levels "Baik","Fantastis",..: 3 5 3 5 4 3 5 5 4 5 ...
## $ Jumlah_Ulasan : num 59 5 20 60 80 55 82 50 662 177 ...
## $ Ulasan1 : Factor w/ 14 levels "\"Bersih dan rapi\"",..: 13 NA NA 12 12 13 13 13 2 8 ...
## $ Ulasan2 : Factor w/ 22 levels "\"Bersih dan rapi\"",..: 20 NA NA 21 21 NA 20 20 22 22 ...
## $ Destinasi_Terdekat : Factor w/ 48 levels "Alas Harum Bali",..: 31 13 16 46 31 33 34 2 31 16 ...
## $ Lokasi : Factor w/ 17 levels "Blahbatu","Candidasa",..: 6 16 4 16 6 12 13 17 6 13 ...
## $ Tipe_Kamar : Factor w/ 159 levels "1 Bedroom Sensual Hanging Tent (Adult Only)",..: 44 120 134 5 141 24 17 159 40 41 ...
## $ Harga_Permalam_dolar: int 75 217 162 343 501 37 232 2623 88 74 ...
Kode tersebut mengubah semua kolom bertipe karakter (character) dalam data frame trip_df menjadi faktor (factor). Ini berguna untuk memudahkan analisis kategori, seperti pada modeling statistik atau visualisasi yang membutuhkan variabel kategorikal. Pemeriksaan struktur data kemudian dilakukan dengan mengggunakan fungsi str().
data_summary <- data.frame(
Kolom = names(trip_df),
Total_Data = nrow(trip_df),
Non_NA = sapply(trip_df, function(x) sum(!is.na(x))),
Jumlah_NA = sapply(trip_df, function(x) sum(is.na(x))),
Persen_NA = round(sapply(trip_df, function(x) mean(is.na(x)) * 100), 2)
)
print(data_summary)## Kolom Total_Data Non_NA Jumlah_NA Persen_NA
## Hotel Hotel 197 197 0 0.00
## Rating Rating 197 197 0 0.00
## Penilaian Penilaian 197 197 0 0.00
## Jumlah_Ulasan Jumlah_Ulasan 197 196 1 0.51
## Ulasan1 Ulasan1 197 170 27 13.71
## Ulasan2 Ulasan2 197 166 31 15.74
## Destinasi_Terdekat Destinasi_Terdekat 197 197 0 0.00
## Lokasi Lokasi 197 197 0 0.00
## Tipe_Kamar Tipe_Kamar 197 197 0 0.00
## Harga_Permalam_dolar Harga_Permalam_dolar 197 197 0 0.00
Sintaks dibawah ini digunakan untuk mengaktifkan font Poppins dari Google Fonts yang akan digunakan dalam visualisasi R. Fungsi font_add_google() mengunduh dan menambahkan font tersebut ke lingkungan R, sedangkan showtext_auto() mengaktifkan penggunaan font ini otomatis pada plot yang dibuat dengan ggplot2 atau grafik base R.
Berikut penjelasan singkat mengenai skrip visualisasi data Tripadvisor yang digunakan, yang terbagi menjadi dua bagian:
Koding R murni
Seluruh proses mulai dari pembersihan,
pengelompokan, hingga visualisasi dilakukan langsung di R dengan paket
seperti dplyr, stringr, dan ggplot2. Data yang digunakan berupa data
frame lokal, misalnya trip_df.
Koding agregasi MongoDB
Pada bagian ini, proses agregasi
seperti grouping, sorting, dan limiting dilakukan langsung di MongoDB
menggunakan pipeline agregasi (aggregate()), yang dijalankan pada
koleksi database MongoDB, misalnya trip. Visualisasi kemudian dibuat
dengan fungsi ggplot2 di R menggunakan data yang secara langsung diambil
dari database trip.
Walaupun metode agregasi berbeda, visualisasi tetap menggunakan fungsi ggplot2 di R, sehingga tampilan grafik serupa, dengan perbedaan bahwa agregasi datanya dapat dilakukan di R secara lokal atau langsung di MongoDB.
# Visualisasi Basic
trip_df %>%
distinct(Hotel, .keep_all = TRUE) %>%
arrange(desc(Jumlah_Ulasan)) %>%
slice_head(n = 10) %>%
ggplot(aes(x = reorder(Hotel, Jumlah_Ulasan), y = Jumlah_Ulasan)) +
geom_col(fill = "blue") +
geom_text(aes(y = Jumlah_Ulasan + 25, label = Jumlah_Ulasan), vjust = 0, size = 3) +
coord_flip() +
labs(
title = "Top 10 Hotel dengan Jumlah Ulasan Terbanyak",
x = "",
y = "Jumlah Ulasan"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 14, face = "bold", margin = margin(b = 25)),
plot.margin = margin(t = 10, r = 10, b = 10, l = 10) # margin sekeliling plot
)# Visualisasi Interaktif
trip_df %>%
distinct(Hotel, .keep_all = TRUE) %>%
arrange(desc(Jumlah_Ulasan)) %>%
slice_head(n = 10) %>%
ggplot(aes(x = reorder(Hotel, Jumlah_Ulasan), y = Jumlah_Ulasan, fill = Jumlah_Ulasan)) +
geom_col(show.legend = FALSE, width = 0.7) +
geom_text(aes(label = paste0(Jumlah_Ulasan)),
hjust = -0.1,
size = 4,
color = "#444444",
fontface = "bold") +
coord_flip() +
labs(
title = "<span style='color:#E76F51;'>Top 10 Hotel</span> dengan <span style='color:#2A9D8F;'>Jumlah Ulasan Terbanyak</span>",
x="",
y = "Jumlah Ulasan"
) +
scale_fill_gradient(low = "#F4A261", high = "#E76F51") +
theme_minimal(base_family = "poppins", base_size = 14) +
theme(
plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 15)),
axis.title = element_text(face = "bold"),
axis.text = element_text(color = "#264653"),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f0efeb", color = NA),
panel.background = element_rect(fill = "#f0efeb", color = NA)
) +
ylim(0, max(trip_df$Jumlah_Ulasan, na.rm = TRUE) * 1.15)pipeline_ulasan <- '[
{"$group": {"_id": "$Hotel", "Jumlah_Ulasan": {"$max": "$Jumlah_Ulasan"}}},
{"$sort": {"Jumlah_Ulasan": -1}},
{"$limit": 10},
{"$project": {"Hotel": "$_id", "Jumlah_Ulasan": 1, "_id": 0}}
]'
top10_ulasan <- trip$aggregate(pipeline_ulasan)
# Visualisasi Basic
ggplot(top10_ulasan, aes(x = reorder(Hotel, Jumlah_Ulasan), y = Jumlah_Ulasan)) +
geom_col(fill = "blue") +
geom_text(aes(y = Jumlah_Ulasan + 20, label = Jumlah_Ulasan), vjust = 0, size = 3) +
coord_flip() +
labs(
title = "Top 10 Hotel dengan Jumlah Ulasan Terbanyak",
x = "",
y = "Jumlah Ulasan"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 14, face = "bold", margin = margin(b = 25)),
plot.margin = margin(t = 10, r = 10, b = 10, l = 10)
)# Visualisasi Interaktif
ggplot(top10_ulasan,aes(x = reorder(Hotel, Jumlah_Ulasan), y = Jumlah_Ulasan, fill = Jumlah_Ulasan)) +
geom_col(show.legend = FALSE, width = 0.7) +
geom_text(aes(label = paste0(Jumlah_Ulasan)),
hjust = -0.1,
size = 4,
color = "#444444",
fontface = "bold") +
coord_flip() +
labs(
title = "<span style='color:#E76F51;'>Top 10 Hotel</span> dengan <span style='color:#2A9D8F;'>Jumlah Ulasan Terbanyak</span>",
x="",
y = "Jumlah Ulasan"
) +
scale_fill_gradient(low = "#F4A261", high = "#E76F51") +
theme_minimal(base_family = "poppins", base_size = 14) +
theme(
plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 15)),
axis.title = element_text(face = "bold"),
axis.text = element_text(color = "#264653"),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f0efeb", color = NA),
panel.background = element_rect(fill = "#f0efeb", color = NA)
) +
ylim(0, max(trip_df$Jumlah_Ulasan, na.rm = TRUE) * 1.15)pipeline_top10_rating <- '[
{ "$match": { "Rating": { "$ne": null } } },
{ "$sort": { "Rating": -1, "Hotel": 1 } },
{ "$limit": 10 },
{ "$project": { "Hotel": 1, "Rating": 1, "Jumlah_Ulasan": 1 } }
]'
top10_rating <- trip$aggregate(pipeline_top10_rating)
# Visualisasi Basic
ggplot(top10_rating, aes(x = reorder(Hotel, Rating), y = Rating, fill = Rating)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = round(Rating, 1)), hjust = -0.1, color = "black", size = 4) +
coord_flip() +
labs(title = "Top 10 Hotel dengan Rating Tertinggi",
x = "Nama Hotel", y = "Rating") +
theme_minimal(base_family = "poppins")pipeline_top_rating <- '[
{
"$group": {
"_id": "$Rating",
"Jumlah_Hotel": { "$sum": 1 },
"Hotel_List": { "$push": "$Hotel" }
}
},
{
"$project": {
"Rating": "$_id",
"Jumlah_Hotel": 1,
"Hotel_List": {
"$reduce": {
"input": "$Hotel_List",
"initialValue": "",
"in": {
"$cond": [
{ "$eq": ["$$value", ""] },
"$$this",
{ "$concat": ["$$value", "; ", "$$this"] }
]
}
}
}
}
},
{ "$sort": { "Rating": -1 } },
{ "$limit": 5 }
]'
# Mengambil data dari Mongo
top_rating5 <- trip$aggregate(pipeline_top_rating)
# Tambahkan ranking dan atur maksimal panjang teks label
top_rating5 <- top_rating5 %>%
mutate(Rank = row_number(),
Hotel_List = str_wrap(Hotel_List, width = 70))
# Visualisasi menggunakan Plotly
fig <- plot_ly(
data = top_rating5,
y = ~factor(Rank),
x = ~Jumlah_Hotel,
type = 'bar',
text = ~Hotel_List,
textposition = 'none',
orientation = 'h',
marker = list(color = '#219ebc'),
hoverinfo = 'text+x'
) %>%
add_text(
x = ~Jumlah_Hotel + max(top_rating5$Jumlah_Hotel)*0.05, # posisi sedikit di kanan bar
y = ~factor(Rank),
text = ~paste("Rating:", Rating),
showlegend = FALSE,
textposition = "middle",
textfont = list(color = "black", size = 8)
) %>%
layout(
title = "Top 5 Rating Hotel (Gabungan Hotel)",
yaxis = list(title = "Peringkat Rating"),
xaxis = list(title = "Jumlah Hotel"),
margin = list(l = 200)
)
fig## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
# Visualisasi Basic
trip_df %>%
group_by(Penilaian) %>%
summarise(Jumlah = n()) %>%
mutate(Penilaian = factor(Penilaian, levels = c("Baik", "Luar Biasa", "Sangat Baik",
"Menakjubkan", "Fantastis"))) %>%
ggplot(aes(x = Penilaian, y = Jumlah)) +
geom_col(fill = "coral") +
geom_text(aes(label = Jumlah), hjust = -0.1, size = 4, color = "black") +
coord_flip() +
labs(title = "Distribusi Penilaian Hotel", x = "Penilaian", y = "Jumlah Hotel") +
theme_minimal()# Visualisasi Interaktif
trip_df %>%
group_by(Penilaian) %>%
summarise(Jumlah = n()) %>%
mutate(Penilaian = factor(Penilaian, levels = c("Baik", "Luar Biasa", "Sangat Baik",
"Menakjubkan", "Fantastis"))) %>%
ggplot(aes(x = Penilaian, y = Jumlah, fill = Penilaian)) +
geom_col(width = 0.6, show.legend = FALSE) +
geom_text(aes(label = paste0(Jumlah)), vjust = -0.5, size = 5, family = "poppins", fontface = "bold", color = "#333333") +
scale_fill_brewer(palette = "Set2") +
labs(
title = "<span style='color:#E76F51;'>Distribusi</span> Penilaian Hotel",
x = "Penilaian",
y = "Jumlah Hotel"
) +
theme_minimal(base_family = "poppins", base_size = 14) +
theme(
plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 10)),
axis.title = element_text(face = "bold"),
axis.text = element_text(color = "#264653"),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#fefae0", color = NA),
panel.background = element_rect(fill = "#fefae0", color = NA)
) +
ylim(0, max(table(trip_df$Penilaian)) * 1.2)pipeline_penilaian <- '[
{"$group": {"_id": "$Penilaian", "Jumlah": {"$sum": 1}}},
{"$sort": {"Jumlah": -1}},
{"$project": {"Penilaian": "$_id", "Jumlah": 1, "_id": 0}}
]'
dist_penilaian <- trip$aggregate(pipeline_penilaian)
dist_penilaian <- dist_penilaian[, c("Penilaian", "Jumlah")]
# Perbaiki nama kolom dan faktor urutan
colnames(dist_penilaian)[1] <- "Penilaian"
dist_penilaian$Penilaian <- factor(dist_penilaian$Penilaian,
levels = c("Baik", "Luar Biasa", "Sangat Baik", "Menakjubkan", "Fantastis"))
# Visualisasi Basic
ggplot(dist_penilaian, aes(x = reorder(Penilaian, Jumlah), y = Jumlah, fill = Jumlah)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = Jumlah), vjust = -0.3, size = 3) +
labs(title = "Distribusi Penilaian Hotel", x = "Penilaian", y = "Jumlah Hotel") +
theme_minimal() +
coord_flip()# Visualisasi Interaktif
ggplot(dist_penilaian, aes(x = Penilaian, y = Jumlah, fill = Penilaian)) +
geom_col(width = 0.6, show.legend = FALSE) +
geom_text(aes(label = Jumlah), vjust = -0.5, size = 5, family = "poppins", fontface = "bold", color = "#333333") +
scale_fill_brewer(palette = "Set2") +
labs(
title = "<span style='color:#E76F51;'>Distribusi</span> Penilaian Hotel",
x = "Penilaian",
y = "Jumlah Hotel"
) +
theme_minimal(base_family = "poppins", base_size = 14) +
theme(
plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 10)),
axis.title = element_text(face = "bold"),
axis.text = element_text(color = "#264653"),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#fefae0", color = NA),
panel.background = element_rect(fill = "#fefae0", color = NA)
) +
ylim(0, max(dist_penilaian$Jumlah) * 1.2)# Visualisasi Basic
trip_df %>%
group_by(Lokasi) %>%
summarise(Rata_Rating = mean(Rating, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(Lokasi, Rata_Rating), y = Rata_Rating)) +
geom_col(fill = "lightgreen") +
coord_flip() +
geom_text(aes(label = paste0(round(Rata_Rating, 1))), vjust = -0.3, size = 3) +
labs(title = "Rata-rata Rating per Lokasi", x = "Lokasi", y = "Rata-rata Rating")# Visualisasi Interaktif
trip_df %>%
group_by(Lokasi) %>%
summarise(Rata_Rating = mean(Rating, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(Lokasi, Rata_Rating), y = Rata_Rating, fill = Rata_Rating)) +
geom_col(width = 0.6, show.legend = FALSE) +
geom_text(
aes(label = paste0(round(Rata_Rating, 1))),
hjust = -0.1,
family = "poppins",
size = 5,
color = "#333333",
fontface = "bold"
) +
scale_fill_gradient(low = "#A8DADC", high = "blue") +
coord_flip() +
labs(
title = "<span style='color:#1D3557;'>Rata-rata Rating</span> per Lokasi️",
x = "Lokasi",
y = "Rata-rata Rating"
) +
theme_minimal(base_family = "poppins", base_size = 14) +
theme(
plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 10)),
axis.title = element_text(face = "bold"),
axis.text = element_text(color = "#264653"),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f1faee", color = NA),
panel.background = element_rect(fill = "#f1faee", color = NA)
) +
ylim(0, max(trip_df$Rating, na.rm = TRUE) + 0.5)pipeline_rating_lokasi <- '[
{ "$group": {
"_id": "$Lokasi",
"Rata2_Rating": { "$avg": { "$toDouble": "$Rating" } }
}},
{ "$sort": { "Rata2_Rating": -1 } }
]'
rating_lokasi <- trip$aggregate(pipeline_rating_lokasi)
colnames(rating_lokasi)[1] <- "Lokasi"
# Visualisasi Basic
ggplot(rating_lokasi, aes(x = reorder(Lokasi, Rata2_Rating), y = Rata2_Rating)) +
geom_col(fill = "seagreen") +
coord_flip() +
geom_text(aes(label = paste0(round(Rata2_Rating, 1))), vjust = -0.3, size = 3) +
labs(title = "Rata-rata Rating per Lokasi", x = "Lokasi", y = "Rata-rata Rating") +
theme_minimal()# Visualisasi Interaktif
ggplot(rating_lokasi, aes(x = reorder(Lokasi, Rata2_Rating), y = Rata2_Rating, fill = Rata2_Rating)) +
geom_col(width = 0.6, show.legend = FALSE) +
geom_text(
aes(label = round(Rata2_Rating, 1)),
hjust = -0.1,
family = "poppins",
size = 5,
color = "#333333",
fontface = "bold"
) +
scale_fill_gradient(low = "#A8DADC", high = "blue") +
coord_flip() +
labs(
title = "<span style='color:#1D3557;'>Rata-rata Rating</span> per Lokasi️",
x = "Lokasi",
y = "Rata-rata Rating"
) +
theme_minimal(base_family = "poppins", base_size = 14) +
theme(
plot.title = element_markdown(size = 20, face = "bold", margin = margin(b = 10)),
axis.title = element_text(face = "bold"),
axis.text = element_text(color = "#264653"),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f1faee", color = NA),
panel.background = element_rect(fill = "#f1faee", color = NA)
) +
ylim(0, max(rating_lokasi$Rata2_Rating) + 0.5)# Visualisasi Basic
trip_df %>%
group_by(Destinasi_Terdekat) %>%
summarise(Jumlah = n()) %>%
slice_max(order_by = Jumlah, n = 5) %>%
ggplot(aes(x = reorder(Destinasi_Terdekat, Jumlah), y = Jumlah)) +
geom_col(fill = "purple") +
geom_text(aes(label = Jumlah), hjust = -0.1, size = 4, color = "black") +
coord_flip() +
labs(title = "Top 5 Destinasi Terdekat", x = "Destinasi", y = "Jumlah Hotel") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))# Visualisasi Interaktif
trip_df %>%
group_by(Destinasi_Terdekat) %>%
summarise(Jumlah = n()) %>%
slice_max(order_by = Jumlah, n = 5) %>%
ggplot(aes(x = reorder(Destinasi_Terdekat, -Jumlah), y = Jumlah, fill = Jumlah)) +
geom_col(width = 0.7, show.legend = FALSE) +
geom_text(aes(label = Jumlah),
vjust = -0.5,
family = "poppins",
size = 4.5,
color = "#333333") +
scale_fill_gradient(low = "#CDB4DB", high = "#5E548E") +
labs(
title = "Top 5 Destinasi Terdekat",
x = "Destinasi",
y = "Jumlah Hotel"
) +
theme_minimal(base_family = "poppins", base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 18, color = "#2A2A2A"),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10, color = "#444444"),
axis.text.y = element_text(color = "#444444"),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f8f9fa", color = NA),
panel.background = element_rect(fill = "#f8f9fa", color = NA)
)pipeline_destinasi <- '[
{ "$group": {
"_id": "$Destinasi_Terdekat",
"Jumlah": { "$sum": 1 }
}},
{ "$sort": { "Jumlah": -1 } },
{ "$limit": 5 }
]'
destinasi <- trip$aggregate(pipeline_destinasi)
colnames(destinasi)[1] <- "Destinasi_Terdekat"
# Visualisasi Basic
ggplot(destinasi, aes(x = reorder(Destinasi_Terdekat, Jumlah), y = Jumlah)) +
geom_col(fill = "purple") +
coord_flip() +
labs(title = "Top 5 Destinasi Terdekat Berdasarkan Jumlah Hotel", x = "Destinasi Terdekat", y = "Jumlah") +
theme_minimal()# Visualisasi Interaktif
ggplot(destinasi, aes(x = reorder(Destinasi_Terdekat, -Jumlah), y = Jumlah, fill = Jumlah)) +
geom_col(width = 0.7, show.legend = FALSE) +
geom_text(aes(label = Jumlah),
vjust = -0.5,
family = "poppins",
size = 4.5,
color = "#333333") +
scale_fill_gradient(low = "#CDB4DB", high = "#5E548E") +
labs(
title = "Top 5 Destinasi Terdekat Berdasarkan Jumlah Hotel",
x = "Destinasi",
y = "Jumlah Hotel"
) +
theme_minimal(base_family = "poppins", base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 18, color = "#2A2A2A"),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10, color = "#444444"),
axis.text.y = element_text(color = "#444444"),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f8f9fa", color = NA),
panel.background = element_rect(fill = "#f8f9fa", color = NA)
)# Visualisasi Basic
trip_df %>%
distinct(Hotel, Lokasi) %>%
group_by(Lokasi) %>%
summarise(Jumlah_Hotel = n()) %>%
ggplot(aes(x = reorder(Lokasi, Jumlah_Hotel), y = Jumlah_Hotel)) +
geom_col(fill = "orange") +
coord_flip() +
geom_text(aes(label = Jumlah_Hotel), vjust = -0.3, size = 3) +
labs(title = "Jumlah Hotel per Lokasi", x = "Lokasi", y = "Jumlah Hotel")# Simpan hasil agregasi
lokasi_df <- trip_df %>%
distinct(Hotel, Lokasi) %>%
count(Lokasi, name = "Jumlah_Hotel")
# Visualisasi Interaktif
ggplot(lokasi_df, aes(x = reorder(Lokasi, Jumlah_Hotel), y = Jumlah_Hotel, fill = Jumlah_Hotel)) +
geom_col(width = 0.7, show.legend = FALSE) +
geom_text(aes(label = Jumlah_Hotel), hjust = -0.2, color = "#333333", size = 4.5, fontface = "bold") +
coord_flip() +
scale_fill_gradient(low = "#F4A261", high = "#E76F51") +
labs(
title = "Jumlah Hotel per Lokasi",
x = "Lokasi",
y = "Jumlah Hotel"
) +
theme_minimal(base_family = "poppins") +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#2A2A2A"),
axis.text = element_text(color = "#444444"),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#FAF9F6", color = NA),
panel.background = element_rect(fill = "#FAF9F6", color = NA)
) +
ylim(0, max(lokasi_df$Jumlah_Hotel) * 1.15)pipeline_jml_hotel_lokasi <- '[
{ "$group": {
"_id": "$Lokasi",
"Jumlah_Hotel": { "$sum": 1 }
}},
{ "$sort": { "Jumlah_Hotel": -1 } }
]'
jml_hotel_lokasi <- trip$aggregate(pipeline_jml_hotel_lokasi)
colnames(jml_hotel_lokasi)[1] <- "Lokasi"
ggplot(jml_hotel_lokasi, aes(x = reorder(Lokasi, Jumlah_Hotel), y = Jumlah_Hotel)) +
geom_col(fill = "darkred") +
geom_text(aes(label = Jumlah_Hotel), hjust = -0.1, size = 4, color = "black") +
coord_flip() +
labs(
title = "Jumlah Hotel per Lokasi",
x = "Lokasi",
y = "Jumlah Hotel"
) +
theme_minimal() +
expand_limits(y = max(jml_hotel_lokasi$Jumlah_Hotel) * 1.1) # memberi ruang untuk label# Visualisasi Interaktif
ggplot(jml_hotel_lokasi, aes(x = reorder(Lokasi, Jumlah_Hotel), y = Jumlah_Hotel, fill = Jumlah_Hotel)) +
geom_col(width = 0.7, show.legend = FALSE) +
geom_text(aes(label = Jumlah_Hotel), hjust = -0.2, color = "#333333", size = 4.5,
fontface = "bold") +
coord_flip() +
scale_fill_gradient(low = "#F4A261", high = "#E76F51") +
labs(
title = "Jumlah Hotel per Lokasi",
x = "Lokasi",
y = "Jumlah Hotel"
) +
theme_minimal(base_family = "poppins") +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#2A2A2A"),
axis.text = element_text(color = "#444444"),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#FAF9F6", color = NA),
panel.background = element_rect(fill = "#FAF9F6", color = NA)
) +
ylim(0, max(lokasi_df$Jumlah_Hotel) * 1.15)# Visualisasi Basic
trip_df %>%
distinct(Hotel, Penilaian) %>%
group_by(Penilaian) %>%
mutate(Penilaian = factor(Penilaian, levels = c("Baik", "Luar Biasa", "Sangat Baik",
"Menakjubkan", "Fantastis"))) %>%
summarise(Jumlah_Hotel = n()) %>%
ggplot(aes(x = Penilaian, y = Jumlah_Hotel)) +
geom_col(fill = "darkgreen") +
geom_text(aes(label = Jumlah_Hotel), hjust = -0.1, size = 4, color = "black") +
coord_flip() +
labs(title = "Jumlah Hotel Berdasarkan Penilaian", x = "Penilaian", y = "Jumlah Hotel")# Visualisasi Interaktif
trip_df %>%
distinct(Hotel, Penilaian) %>%
group_by(Penilaian) %>%
mutate(Penilaian = factor(Penilaian, levels = c("Baik", "Luar Biasa", "Sangat Baik",
"Menakjubkan", "Fantastis"))) %>%
summarise(Jumlah_Hotel = n()) %>%
ggplot(aes(x = Penilaian, y = Jumlah_Hotel, fill = Penilaian)) +
geom_col(width = 0.7, show.legend = FALSE) +
geom_text(aes(label = Jumlah_Hotel), vjust = -0.5, fontface = "bold", color = "#333333", size = 5) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Jumlah Hotel Berdasarkan Penilaian",
x = "Penilaian",
y = "Jumlah Hotel"
) +
theme_minimal(base_family = "poppins") +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#264653"),
axis.text = element_text(color = "#2A2A2A"),
panel.grid.major.y = element_blank(),
plot.background = element_rect(fill = "#FAF9F6", color = NA),
panel.background = element_rect(fill = "#FAF9F6", color = NA)
) +
ylim(0, NA)pipeline_jml_hotel_penilaian <- '[
{"$group": {"_id": "$Penilaian","Jumlah_Hotel": { "$sum": 1 }}},
{"$project": {"_id": 0,"Penilaian": "$_id","Jumlah_Hotel": 1}},
{"$sort": { "Jumlah_Hotel": -1 }}]'
jml_hotel_penilaian <- trip$aggregate(pipeline_jml_hotel_penilaian)
# Perbaiki nama kolom dan faktor urutan
jml_hotel_penilaian$Penilaian <- factor(jml_hotel_penilaian$Penilaian,
levels = c("Baik", "Luar Biasa", "Sangat Baik", "Menakjubkan", "Fantastis"))
#Visualisasi Basic
ggplot(jml_hotel_penilaian, aes(x = Penilaian, y = Jumlah_Hotel)) +
geom_col(fill = "darkblue") +
coord_flip() +
geom_text(aes(label = Jumlah_Hotel), hjust = -0.1, size = 4, color = "black") +
labs(title = "Jumlah Hotel berdasarkan Penilaian", x = "Penilaian", y = "Jumlah Hotel") +
theme_minimal()# Visualisasi Interaktif
ggplot(jml_hotel_penilaian, aes(x = Penilaian, y = Jumlah_Hotel, fill = Penilaian)) +
geom_col(width = 0.7, show.legend = FALSE) +
geom_text(aes(label = Jumlah_Hotel), vjust = -0.5, fontface = "bold", color = "#333333", size = 5) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Jumlah Hotel Berdasarkan Penilaian",
x = "Penilaian",
y = "Jumlah Hotel"
) +
theme_minimal(base_family = "poppins") +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#264653"),
axis.text = element_text(color = "#2A2A2A"),
panel.grid.major.y = element_blank(),
plot.background = element_rect(fill = "#FAF9F6", color = NA),
panel.background = element_rect(fill = "#FAF9F6", color = NA)
) +
ylim(0, max(jml_hotel_penilaian$Jumlah_Hotel) * 1.1)pipeline_boxplot <- '[
{ "$match": { "Harga_Permalam_dolar": { "$ne": null }, "Penilaian": { "$ne": null } } },
{ "$project": { "Harga_Permalam_dolar": 1, "Penilaian": 1 } }
]'
boxplot_hargavsnilai <- trip$aggregate(pipeline_boxplot)
# Cek data
str(boxplot_hargavsnilai)## 'data.frame': 197 obs. of 3 variables:
## $ _id : chr "68356ca693602cd26ed78abc" "68356ca693602cd26ed78abd" "68356ca693602cd26ed78abe" "68356ca693602cd26ed78abf" ...
## $ Penilaian : chr "Luar Biasa" "Sangat Baik" "Luar Biasa" "Sangat Baik" ...
## $ Harga_Permalam_dolar: int 75 217 162 343 501 37 232 2623 88 74 ...
## _id Penilaian Harga_Permalam_dolar
## 1 68356ca693602cd26ed78abc Luar Biasa 75
## 2 68356ca693602cd26ed78abd Sangat Baik 217
## 3 68356ca693602cd26ed78abe Luar Biasa 162
## 4 68356ca693602cd26ed78abf Sangat Baik 343
## 5 68356ca693602cd26ed78ac0 Menakjubkan 501
## 6 68356ca693602cd26ed78ac1 Luar Biasa 37
# Pastikan kolom Rating numeric
boxplot_hargavsnilai$Harga_Permalam_dolar <- as.numeric(boxplot_hargavsnilai$Harga_Permalam_dolar)
# Perbaiki nama kolom dan faktor urutan
boxplot_hargavsnilai$Penilaian <- factor(boxplot_hargavsnilai$Penilaian,
levels = c("Baik", "Luar Biasa", "Sangat Baik", "Menakjubkan", "Fantastis"))
# Visualisasi Basic
ggplot(boxplot_hargavsnilai, aes(x = Harga_Permalam_dolar, y = Penilaian, fill = Penilaian)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Distribusi Harga per Malam berdasarkan Penilaian Hotel",
x = "Harga permalam (dalam dolar)", y = "Penilaian") +
theme_minimal() +
theme(legend.position = "none")# Visualisasi + Violin
ggplot(boxplot_hargavsnilai, aes(x = reorder(Penilaian, Harga_Permalam_dolar, FUN = median), y = Harga_Permalam_dolar)) +
geom_violin(fill = "red", alpha = 0.7) +
geom_boxplot(width = 0.1, fill = "white", outlier.color = NA) +
coord_flip() +
labs(
title = "Distribusi Harga per Malam Berdasarkan Penilaian Hotel",
x = "Penilaian",
y = "Harga per Malam (USD)"
) +
theme_minimal(base_size = 14)# Query MongoDB: Ambil lokasi dan harga per malam, filter harga tidak null
pipeline_harga_lokasi <- '[
{ "$match": { "Harga_Permalam_dolar": { "$ne": null } } },
{ "$project": { "Lokasi": 1, "Harga_Permalam_dolar": 1, "_id": 0 } }
]'
harga_lokasi <- trip$aggregate(pipeline_harga_lokasi)
# Visualisasi Boxplot Harga per Lokasi di R
ggplot(harga_lokasi, aes(x = reorder(Lokasi, Harga_Permalam_dolar, FUN = median), y = Harga_Permalam_dolar)) +
geom_boxplot(fill = "#69b3a2", alpha = 0.7) +
coord_flip() +
labs(
title = "Distribusi Harga per Malam Berdasarkan Lokasi",
x = "Lokasi",
y = "Harga per Malam (USD)"
) +
theme_minimal(base_size = 14)# Visualisasi Basic
ggplot(trip_df, aes(x = Harga_Permalam_dolar, y = Rating)) +
geom_point(alpha = 0.6, color = "blue") +
labs(title = "Harga per Malam vs Rating", x = "Harga per Malam (US$)", y = "Rating")#Visualisasi Interaktif
ggplot(trip_df, aes(x = Harga_Permalam_dolar, y = Rating)) +
geom_point(aes(color = Rating), alpha = 0.7, size = 3) +
geom_smooth(method = "loess", se = FALSE, color = "#264653", linetype = "dashed") +
scale_color_gradient(low = "#90CAF9", high = "#1E88E5") +
labs(
title = "Hubungan antara Harga per Malam dan Rating Hotel",
x = "Harga per Malam (US$)",
y = "Rating"
) +
theme_minimal(base_family = "poppins") +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#2A2A2A"),
axis.text = element_text(color = "#444444"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f8f9fa", color = NA),
panel.background = element_rect(fill = "#f8f9fa", color = NA)
)## `geom_smooth()` using formula = 'y ~ x'
# Ambil data Harga dan Rating
data_hrg_rating <- trip$find(fields = '{"Harga_Permalam_dolar":1, "Rating":1, "_id":0}')
# Ubah ke numeric dan filter NA
data_hrg_rating <- data_hrg_rating %>%
mutate(
Harga = as.numeric(Harga_Permalam_dolar),
Rating = as.numeric(Rating)
) %>%
filter(!is.na(Harga) & !is.na(Rating))
# Visualisasi Basic
ggplot(data_hrg_rating, aes(x = Harga, y = Rating)) +
geom_point(alpha = 0.6, color = "blue") +
labs(title = "Harga per Malam vs Rating", x = "Harga per Malam (US$)", y = "Rating")# Visualisasi Interaktif
ggplot(data_hrg_rating, aes(x = Harga, y = Rating)) +
geom_point(aes(color = Rating), alpha = 0.7, size = 3) +
geom_smooth(method = "loess", se = FALSE, color = "#264653", linetype = "dashed") +
scale_color_gradient(low = "#90CAF9", high = "#1E88E5") +
labs(
title = "Hubungan antara Harga per Malam dan Rating Hotel",
x = "Harga per Malam (US$)",
y = "Rating"
) +
theme_minimal(base_family = "poppins") +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#2A2A2A"),
axis.text = element_text(color = "#444444"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f8f9fa", color = NA),
panel.background = element_rect(fill = "#f8f9fa", color = NA)
)## `geom_smooth()` using formula = 'y ~ x'
trip_df <- trip_df %>%
mutate(
Rating = as.numeric(Rating),
Jumlah_Ulasan = as.numeric(Jumlah_Ulasan)
) %>%
filter(!is.na(Rating) & !is.na(Jumlah_Ulasan))
# Visualisasi Basic
ggplot(trip_df, aes(x = Rating, y = Jumlah_Ulasan)) +
geom_point(alpha = 0.6, color = "darkred") +
labs(title = "Rating vs Jumlah Ulasan", x = "Rating", y = "Jumlah Ulasan")#Visualisasi Interaktif
ggplot(trip_df, aes(x = Rating, y = Jumlah_Ulasan)) +
geom_point(aes(color = Rating, size = Jumlah_Ulasan), alpha = 0.7) +
geom_smooth(method = "loess", se = FALSE, color = "#264653", linetype = "dashed") +
scale_color_gradient(low = "#F4A261", high = "#E76F51") +
labs(
title = "Hubungan antara Rating dan Jumlah Ulasan Hotel",
x = "Rating",
y = "Jumlah Ulasan"
) +
theme_minimal(base_family = "poppins") +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#333333"),
axis.text = element_text(color = "#444444"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f9f9f9", color = NA),
panel.background = element_rect(fill = "#f9f9f9", color = NA)
)## `geom_smooth()` using formula = 'y ~ x'
data_rating_ulasan <- trip$find(fields = '{"Rating":1, "Jumlah_Ulasan":1, "_id":0}')
data_rating_ulasan <- data_rating_ulasan %>%
mutate(
Rating = as.numeric(Rating),
Jumlah_Ulasan = as.numeric(Jumlah_Ulasan)
) %>%
filter(!is.na(Rating) & !is.na(Jumlah_Ulasan))
#Visualisasi Basic
ggplot(data_rating_ulasan, aes(x = Rating, y = Jumlah_Ulasan)) +
geom_point(color = "coral") +
labs(title = "Rating Hotel vs Jumlah Ulasan", x = "Rating", y = "Jumlah Ulasan") +
theme_minimal()#Visualisasi Interaktif
ggplot(data_rating_ulasan, aes(x = Rating, y = Jumlah_Ulasan)) +
geom_point(aes(color = Rating, size = Jumlah_Ulasan), alpha = 0.7) +
geom_smooth(method = "loess", se = FALSE, color = "#264653", linetype = "dashed") +
scale_color_gradient(low = "#F4A261", high = "#E76F51") +
labs(
title = "Rating Hotel vs Jumlah Ulasan",
x = "Rating",
y = "Jumlah Ulasan"
) +
theme_minimal(base_family = "poppins") +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#333333"),
axis.text = element_text(color = "#444444"),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f9f9f9", color = NA),
panel.background = element_rect(fill = "#f9f9f9", color = NA)
)## `geom_smooth()` using formula = 'y ~ x'
pipeline_korelasi <- '[
{ "$match": {
"Harga_Permalam_dolar": { "$ne": null },
"Rating": { "$ne": null },
"Jumlah_Ulasan": { "$ne": null }
}},
{ "$project": {
"Harga_Permalam_dolar": 1,
"Rating": 1,
"Jumlah_Ulasan": 1
}}
]'
data_korelasi <- trip$aggregate(pipeline_korelasi)
# Visualisasi
ggplot(data_korelasi, aes(x = Rating, y = Harga_Permalam_dolar, size = Jumlah_Ulasan)) +
geom_point(alpha = 0.7, color = "#2a9d8f") +
scale_size(range = c(2, 10)) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Harga vs Rating dan Jumlah Ulasan",
x = "Rating", y = "Harga per Malam (dalam dolar)", size = "Jumlah Ulasan") +
theme_minimal(base_family = "poppins")