Kalibrr merupakan sumber informasi yang dapat diandalkan untuk mencari dan menilai peluang karier dalam industri, serta digunakan oleh pencari kerja dan profesional sumber daya manusia untuk menemukan dan menilai potensi perusahaan dan posisi pekerjaan yang sesuai dengan kebutuhan mereka.
Pada project kali ini dilakukan scraping pada website https://www.kalibrr.id/. Kalibrr adalah platform yang menyediakan informasi tentang lowongan kerja, perusahaan, dan kesempatan karier lainnya. Project ini bertujuan untuk mengumpulkan data dari Kalibrr menggunakan teknik web scraping, yang nantinya akan digunakan untuk analisis dan visualisasi data terkait informasi lowongan kerja dan perusahaan yang tersedia. Proyek ini akan menggunakan teknik web scraping untuk mengumpulkan beberapa informasi yang relevan dari website Kalibrr. Setelah data berhasil dikumpulkan, langkah selanjutnya adalah melakukan analisis data dan membuat visualisasi yang informatif, seperti grafik untuk menggambarkan tren lowongan kerja berdasarkan sektor industri, tingkat pendidikan yang dibutuhkan, atau lokasi pekerjaan yang paling banyak dicari. Visualisasi ini akan membantu dalam memahami dinamika pasar kerja dan memberikan wawasan berharga bagi para pencari kerja dan perusahaan.
Dalam hal ini, data yang akan dilakukan scraping berkaitan dengan :
Posisi: Posisi atau jabatan pekerjaan yang ditawarkan.
Perusahaan: meliputi kenyamanan lounge, kebersihan, katering makanan, toilet, layanan staf, dll.
Lokasi: Lokasi tempat kerja.
Gaji: Informasi tentang gaji yang ditawarkan untuk posisi tersebut.
Jenis: Jenis pekerjaan, seperti full time, part time, internship.
Batas: Tanggal batas pengajuan berkas lamaran
Level: Tingkat pengalaman yang dibutuhkan untuk posisi tersebut, seperti Entry Level / Junior, Apprentice.
Scraping dilakukan dengan software R menggunakan
packages rvest
 dan tidyverse
 sebagai cleaning
tools pada data yang telah diambil. Data yang telah discraping dari
website akan disimpan dalam MongoDB Atlas dan dijadwalkan setiap hari di
jam 01.00 akan diambil secara acak lima data lowongan pekerjaan.
Scraping terjadwal dilakukan dengan membuat workflow di Github
Action.
## Warning: package 'mongolite' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
## time_scraped posisi
## 1 2024-05-31 15:05:29 Officer Development Program (ODP) Batch 2024
## 2 2024-05-31 15:05:29 Social Media Admin
## 3 2024-05-31 15:05:29 Public Relations Analyst [Corporate Communication]
## 4 2024-05-31 15:05:29 Frontliner Staff (Teller & Customer Service)
## 5 2024-05-31 15:05:29 Human Capital Staff (Culture & Employer Branding)
## 6 2024-05-31 15:05:35 Account Relation Officer
## perusahaan lokasi gaji jenis
## 1 Bank Negara Indonesia Jakarta Pusat, Indonesia Salary Undisclosed FULL_TIME
## 2 SKINTIFIC Jakarta Selatan, Indonesia Salary Undisclosed FULL_TIME
## 3 Kompas Gramedia Jakarta Pusat, Indonesia Salary Undisclosed FULL_TIME
## 4 PT Bank BTPN Tbk South Jakarta, Indonesia Salary Undisclosed FULL_TIME
## 5 FIFGROUP Central Jakarta, Indonesia Salary Undisclosed FULL_TIME
## 6 Satu Group Jakarta Utara, Indonesia Salary Undisclosed FULL_TIME
## batas level
## 1 Apply before 29 Jun Entry Level / Junior, Apprentice
## 2 Apply before 29 Apr Entry Level / Junior, Apprentice
## 3 Apply before 13 May Entry Level / Junior, Apprentice
## 4 Apply before 31 Jul Entry Level / Junior, Apprentice
## 5 Apply before 27 Aug Associate / Supervisor
## 6 Apply before 5 Jul Entry Level / Junior, Apprentice
## Rows: 150
## Columns: 8
## $ time_scraped <dttm> 2024-05-31 15:05:29, 2024-05-31 15:05:29, 2024-05-31 15:…
## $ posisi <chr> "Officer Development Program (ODP) Batch 2024", "Social M…
## $ perusahaan <chr> "Bank Negara Indonesia", "SKINTIFIC", "Kompas Gramedia", …
## $ lokasi <chr> "Jakarta Pusat, Indonesia", "Jakarta Selatan, Indonesia",…
## $ gaji <chr> "Salary Undisclosed", "Salary Undisclosed", "Salary Undis…
## $ jenis <chr> "FULL_TIME", "FULL_TIME", "FULL_TIME", "FULL_TIME", "FULL…
## $ batas <chr> "Apply before 29 Jun", "Apply before 29 Apr", "Apply befo…
## $ level <chr> "Entry Level / Junior, Apprentice", "Entry Level / Junior…
# Filter data yang memiliki nilai posisi "Social Media Admin"
data_filter<- dataku[dataku$posisi == "Social Media Admin", ]
head(data_filter)
## time_scraped posisi perusahaan
## 2 2024-05-31 15:05:29 Social Media Admin SKINTIFIC
## 32 2024-06-03 08:56:29 Social Media Admin SKINTIFIC
## 77 2024-06-07 08:57:12 Social Media Admin SKINTIFIC
## 117 2024-06-11 08:57:10 Social Media Admin SKINTIFIC
## NA <NA> <NA> <NA>
## NA.1 <NA> <NA> <NA>
## lokasi gaji jenis
## 2 Jakarta Selatan, Indonesia Salary Undisclosed FULL_TIME
## 32 Jakarta Selatan, Indonesia Salary Undisclosed FULL_TIME
## 77 Jakarta Selatan, Indonesia Salary Undisclosed FULL_TIME
## 117 Jakarta Selatan, Indonesia Salary Undisclosed FULL_TIME
## NA <NA> <NA> <NA>
## NA.1 <NA> <NA> <NA>
## batas level
## 2 Apply before 29 Apr Entry Level / Junior, Apprentice
## 32 Apply before 29 Apr Entry Level / Junior, Apprentice
## 77 Apply before 29 Apr Entry Level / Junior, Apprentice
## 117 Apply before 29 Apr Entry Level / Junior, Apprentice
## NA <NA> <NA>
## NA.1 <NA> <NA>
Sebelum melakukan visualisasi dilihat bahwa terdapat data duplikat dari hasil scrapping sehingga perlu dilakukan penghapusan data duplikat.
# Count the number of entries before removing duplicates
initial_count <- nrow(dataku)
# Remove duplicate entries based on the 'posisi' column
data_clean <- dataku %>% distinct(posisi, .keep_all = TRUE)
# Count the number of entries after removing duplicates
clean_count <- nrow(data_clean)
# Create a data frame for visualization
count_data <- data.frame(
Condition = c("Before Cleaning", "After Cleaning"),
Count = c(initial_count, clean_count)
)
# Create a bar plot to visualize the number of entries before and after removing duplicates
ggplot(count_data, aes(x = Condition, y = Count, fill = Condition)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette = "Set3") +
labs(title = "Number of Entries Before and After Removing Duplicates",
x = "Condition",
y = "Number of Entries") +
theme_minimal()
ggplot(data_clean, aes(x = reorder(lokasi, lokasi, function(x) -length(x)))) +
geom_bar(aes(fill = lokasi), color = "black", show.legend = FALSE) +
labs(title = "Jumlah Pekerjaan Berdasarkan Lokasi",
x = "Lokasi",
y = "Jumlah Pekerjaan",
caption = "Sumber: Hasil Web Scraping") +
scale_fill_brewer(palette = "Set3") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.title = element_text(size = 12, face = "bold"),
plot.caption = element_text(hjust = 0)
) +
geom_text(stat='count', aes(label=..count..), vjust=-0.5)
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplot(data_clean, aes(x = reorder(perusahaan, perusahaan, function(x) -length(x)))) +
geom_bar(aes(fill = perusahaan), color = "black", show.legend = FALSE) +
labs(title = "Jumlah Pekerjaan Berdasarkan Perusahaan",
x = "Perusahaan",
y = "Jumlah Pekerjaan",
caption = "Sumber: Hasil Web Scraping") +
scale_fill_brewer(palette = "Set2") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.title = element_text(size = 12, face = "bold"),
plot.caption = element_text(hjust = 0)
) +
geom_text(stat='count', aes(label=..count..), vjust=-0.5)
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
ggplot(data_clean, aes(x = reorder(level, level, function(x) -length(x)))) +
geom_bar(aes(fill = level), color = "black", show.legend = FALSE) +
labs(title = "Jumlah Pekerjaan Berdasarkan Level",
x = "Level",
y = "Jumlah Pekerjaan",
caption = "Sumber: Hasil Web Scraping") +
scale_fill_brewer(palette = "Set1") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.title = element_text(size = 12, face = "bold"),
plot.caption = element_text(hjust = 0)
) +
geom_text(stat='count', aes(label=..count..), vjust=-0.5)
# Menghitung jumlah pekerjaan berdasarkan jenis
jenis_count <- table(data_clean$jenis)
jenis_percentage <- prop.table(jenis_count) * 100
# Membuat pie chart
pie_chart <- ggplot(data = NULL, aes(x = "", y = jenis_count, fill = names(jenis_count))) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) +
labs(title = "Distribusi Pekerjaan Berdasarkan Jenis Pekerjaan",
fill = "Jenis Pekerjaan",
caption = "Sumber: Hasil Web Scraping") +
theme_void() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
plot.caption = element_text(hjust = 0),
legend.position = "bottom"
) +
geom_text(aes(label = paste0(round(jenis_percentage, 1), "%")),
position = position_stack(vjust = 0.5), color = "white")
print(pie_chart)
## Don't know how to automatically pick scale for object of type <table>.
## Defaulting to continuous.
ggplot(data_clean, aes(x = batas)) +
geom_bar(aes(fill = batas), color = "black", show.legend = FALSE) +
scale_fill_brewer(palette = "Pastel1") +
labs(title = "Batas Waktu Kirim Lamaran",
x = "Batas Waktu",
y = "Jumlah Pekerjaan",
caption = "Sumber: Hasil Web Scraping") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.title = element_text(size = 12, face = "bold"),
plot.caption = element_text(hjust = 0)
) +
geom_text(stat='count', aes(label=..count..), vjust=-0.5, color = "black") +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)))
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Pastel1 is 9
## Returning the palette you asked for with that many colors
ggplot(data_clean, aes(x = reorder(posisi, posisi, function(x) -length(x)))) +
geom_bar(aes(fill = posisi), color = "black", show.legend = FALSE) +
labs(title = "Jumlah Pekerjaan Berdasarkan Posisi",
x = "Posisi",
y = "Jumlah Pekerjaan",
caption = "Sumber: Hasil Web Scraping") +
scale_fill_brewer(palette = "Pastel1") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.title = element_text(size = 12, face = "bold"),
plot.caption = element_text(hjust = 0)
) +
geom_text(stat='count', aes(label=..count..), vjust=-0.5)
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Pastel1 is 9
## Returning the palette you asked for with that many colors
# Menghitung jumlah pekerjaan berdasarkan rentang gaji
gaji_count <- table(data_clean$gaji)
gaji_percentage <- prop.table(gaji_count) * 100
# Membuat pie chart
pie_chart <- ggplot(data = NULL, aes(x = "", y = gaji_count, fill = names(gaji_count))) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y", start = 0) +
labs(title = "Distribusi Pekerjaan Berdasarkan Rentang Gaji",
fill = "Rentang Gaji",
caption = "Sumber: Hasil Web Scraping") +
theme_void() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
plot.caption = element_text(hjust = 0),
legend.position = "bottom"
) +
geom_text(aes(label = paste0(round(gaji_percentage, 1), "%")),
position = position_stack(vjust = 0.5), color = "white")
print(pie_chart)
## Don't know how to automatically pick scale for object of type <table>.
## Defaulting to continuous.