Collecting Twitter Data with rtweet, Web Scrapping with rvest, and Input Method MongoDB Atlas
Berikut ini adalah Materi Praktikum 14 dari Mata Kuliah STA562-Manajemen Data Statistika Mahasiswa Magister Statistika dan Sains Data untuk Peminatan Big Data Analytics
Collecting Twitter Data with rtweet
Berikut ini akan dipelajari, penggunaan rtweet
untuk mengumpulkan data dari twitter.Untuk menjalankannya diperlukan Token dari suatu Twitter Connected Apps. rtweet
sudah menyediakannya.
rstats2twitter
Namun, pada Praktikum ini digunakan, token dari Sedotan, suatu Twitter Connected Apps.
Library
#install.packages("rtweet")
library(rtweet)
Penggunaan Token
<- "4pHN****"
consumer_key <- "VF3z****"
consumer_secret <- "7357****"
access_token <- "cK2t****" access_secret
<- create_token(
token app = "Sedotan",
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret = access_secret)
Contoh Penggunaan rtweet
Mencari Tweet dengan Kata Kunci tertentu
<- search_tweets("indonesia",
rt n = 1800,
include_rts = FALSE
)
Berikut adalah dimensi dari data yang berhasil dikumpulkan.
dim(rt)
## [1] 1800 90
Struktur Data
::glimpse(rt) dplyr
## Rows: 1,800
## Columns: 90
## $ user_id <chr> "1439772405828755458", "990788287680819200", "…
## $ status_id <chr> "1465147121011593221", "1465147108244148231", …
## $ created_at <dttm> 2021-11-29 02:34:29, 2021-11-29 02:34:26, 202…
## $ screen_name <chr> "BUMNJakTimSiap", "Caveman9494", "TonuKumer", …
## $ text <chr> "AYO kita turut andil dalam Gerakan Kolaborasi…
## $ source <chr> "MagellanTweets", "Twitter for Android", "Twit…
## $ display_text_width <dbl> 140, 138, 277, 277, 277, 140, 83, 100, 108, 66…
## $ reply_to_status_id <chr> NA, "1465105612920930304", "146465565687133388…
## $ reply_to_user_id <chr> NA, "1300483034739691520", "100447975756663193…
## $ reply_to_screen_name <chr> NA, "thevibesnews", "CryptoFamilyVN", "DeFiDis…
## $ is_quote <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ is_retweet <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ favorite_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0…
## $ retweet_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ quote_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ reply_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ hashtags <list> "BUMNHijaukanIndonesia", NA, NA, NA, NA, "BUM…
## $ symbols <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ urls_url <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ urls_t.co <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ urls_expanded_url <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ media_url <list> "http://pbs.twimg.com/media/FFU_iprVIAEiXYL.j…
## $ media_t.co <list> "https://t.co/bLOlSqzAVm", NA, NA, NA, NA, "h…
## $ media_expanded_url <list> "https://twitter.com/BUMNJakTimSiap/status/14…
## $ media_type <list> "photo", NA, NA, NA, NA, "photo", NA, "photo"…
## $ ext_media_url <list> "http://pbs.twimg.com/media/FFU_iprVIAEiXYL.j…
## $ ext_media_t.co <list> "https://t.co/bLOlSqzAVm", NA, NA, NA, NA, "h…
## $ ext_media_expanded_url <list> "https://twitter.com/BUMNJakTimSiap/status/14…
## $ ext_media_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ mentions_user_id <list> NA, "1300483034739691520", <"1004479757566631…
## $ mentions_screen_name <list> NA, "thevibesnews", <"CryptoFamilyVN", "a2dao…
## $ lang <chr> "in", "en", "en", "en", "en", "in", "in", "in"…
## $ quoted_status_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_text <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_created_at <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_source <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_favorite_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_retweet_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_user_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_screen_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_followers_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_friends_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_statuses_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_location <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_description <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ quoted_verified <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_status_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_text <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_created_at <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ retweet_source <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_favorite_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_retweet_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_user_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_screen_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_followers_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_friends_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_statuses_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_location <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_description <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_verified <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ place_url <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ place_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ place_full_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ place_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ country <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ country_code <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ geo_coords <list> <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, …
## $ coords_coords <list> <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, …
## $ bbox_coords <list> <NA, NA, NA, NA, NA, NA, NA, NA>, <NA, NA, NA…
## $ status_url <chr> "https://twitter.com/BUMNJakTimSiap/status/146…
## $ name <chr> "BUMN_JakTim_SIAP", "Giri Ram", "sojib 10", "s…
## $ location <chr> "", "", "", "", "", "", "", "Sulawesi Tengah, …
## $ description <chr> "Hobby Olahraga & Travelling", "", "", "", "",…
## $ url <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ protected <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ followers_count <int> 19, 9, 13, 13, 13, 39, 2, 14, 14, 14, 14, 14, …
## $ friends_count <int> 133, 33, 630, 630, 630, 10, 15, 2, 2, 2, 2, 2,…
## $ listed_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ statuses_count <int> 101, 371, 2985, 2985, 2985, 96, 17, 628, 628, …
## $ favourites_count <int> 18, 3209, 608, 608, 608, 0, 247, 613, 613, 613…
## $ account_created_at <dttm> 2021-09-20 02:04:38, 2018-04-30 03:01:50, 202…
## $ verified <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ profile_url <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ profile_expanded_url <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ account_lang <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ profile_banner_url <chr> "https://pbs.twimg.com/profile_banners/1439772…
## $ profile_background_url <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ profile_image_url <chr> "http://pbs.twimg.com/profile_images/143984444…
Mencari tweet melebihi limit dari twitter.
Limit
Twitter membatasi jumlah tweet (18.000) yang bisa diambil dalam jangka waktu tertentu (15 menit). Jika Anda ingin mencari tweet dalam jumlah besar (mengamati kata kunci tertentu), Anda bisa menggunakan opsi retryonratelimit = TRUE
<- search_tweets("data",
rt n = 250000,
retryonratelimit = TRUE
)
Lokasi Tweet
## search for 10,000 tweets sent from the US
<- search_tweets("lang:en",
rt geocode = lookup_coords("usa"),
n = 10000
)
## Warning: Rate limit exceeded - 88
## Warning: Rate limit exceeded
## create lat/lng variables using all available tweet and profile geo-location data
<- lat_lng(rt)
rt
## plot state boundaries
par(mar = c(0, 0, 0, 0))
::map("state", lwd = .25)
maps
## plot lat and lng points onto state map
with(rt, points(lng, lat, pch = 20, cex = .75, col = rgb(0, .3, .7, .75)))
Following dari Suatu Akun
## get user IDs of accounts followed by ipbofficial
<- get_friends("ipbofficial")
following
head(following)
## # A tibble: 6 × 2
## user user_id
## <chr> <chr>
## 1 ipbofficial 151261945
## 2 ipbofficial 22878447
## 3 ipbofficial 74646907
## 4 ipbofficial 51010742
## 5 ipbofficial 2421551478
## 6 ipbofficial 1219523581140332544
lookup_users(following$user_id)[1:10,4]
## # A tibble: 10 × 1
## screen_name
## <chr>
## 1 UGMYogyakarta
## 2 itbofficial
## 3 univ_indonesia
## 4 unpad
## 5 AusIndCentre
## 6 ltmptofficial
## 7 KSPgoid
## 8 pddikti
## 9 SekreSNMPTN
## 10 arif_satria
Follower dari suatu Akun
## get user IDs of accounts following ipbofficial
<- get_followers("ipbofficial", n = 500)
followers head(followers)
## # A tibble: 6 × 1
## user_id
## <chr>
## 1 1465140187764195332
## 2 17824995
## 3 977473478323351552
## 4 1465117404166443009
## 5 1465104855718977536
## 6 1145552705928093697
lookup_users(followers$user_id)[1:10,4]
## # A tibble: 10 × 1
## screen_name
## <chr>
## 1 lattepyong
## 2 widhyaksara
## 3 iamfayzasevanaa
## 4 Fazririsiregar
## 5 AzkarBadri1
## 6 AmandaFebby7
## 7 NMI72729256
## 8 HAmyrullah
## 9 ardellreynold20
## 10 irateniaua
Tweet dari Suatu Akun
<- get_timelines(c("republikaonline", "kompascom", "detikcom"), n = 3200)
tmls
## plot the frequency of tweets for each user over time
%>%
tmls ::filter(created_at > "2021-10-29") %>%
dplyr::group_by(screen_name) %>%
dplyrts_plot("days", trim = 1L) +
::geom_point() +
ggplot2::theme_minimal() +
ggplot2::theme(
ggplot2legend.title = ggplot2::element_blank(),
legend.position = "bottom",
plot.title = ggplot2::element_text(face = "bold")) +
::labs(
ggplot2x = NULL, y = NULL,
title = "Frequency of Twitter statuses posted by news organization",
subtitle = "Twitter status (tweet) counts aggregated by day",
caption = "\nSource: Data collected from Twitter's REST API via rtweet"
)
Easily Harvest (Scrape) Web Pages with rvest
Dalam materi ini, Kita akan melakukan Web Scrapping dari itch.io dengan R.
About itch.io
itch.io is an open marketplace for independent digital creators with a focus on independent video games. It’s a platform that enables anyone to sell the content they’ve created. As a seller you’re in charge of how it’s done: you set the price, you run sales, and you design your pages. It’s never necessary to get votes, likes, or follows to get your content approved, and you can make changes to how you distribute your work as frequently as you like.
Berikut ini, adalah langkah-langkah, untuk scrapping Top Rated Free Games di Windows
Target data yang hendak diambil :
Targeted Data:
- Game Title
- Developer
- Rating Count
- Rating Score
- Story/Description
- Size
Library
library(rvest)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Inisialisasi
# prepare url to scrap
<- "https://itch.io/games/top-rated/free/platform-windows"
url # getting the html codes
<- read_html(url) itchio
itchio
## {html_document}
## <html lang="en">
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
## [2] <body data-page_name="browse" data-host="itch.io" class="locale_en main_l ...
Game Title
#get the product title codes
<- html_nodes(itchio, ".game_title")
title_html 2]] #game which we hover earlier title_html[[
## {html_node}
## <div class="game_title">
## [1] <a class="title game_link" href="https://brianna-lei.itch.io/butterfly-so ...
#convert codes to text
<- html_text(title_html) title_text
title_text
## [1] "Friday Night Funkin'"
## [2] "Butterfly Soup"
## [3] "Our Life: Beginnings & Always"
## [4] "Project Kat"
## [5] "Andromeda Six"
## [6] "Doki Doki Literature Club!"
## [7] "Sort the Court!"
## [8] "Cinderella Phenomenon"
## [9] "Six Cats Under"
## [10] "Mindustry"
## [11] "Ebon Light"
## [12] "missed messages."
## [13] "Blooming Panic"
## [14] "Ravenfield (Beta 5)"
## [15] "Scout: An Apocalypse Story"
## [16] "CHAINSAW DANCE DEMO"
## [17] "Vincent: The Secret of Myers"
## [18] "one night, hot springs [jam ver.]-50%"
## [19] "her tears were my light"
## [20] "Desktop Goose"
## [21] "Therapy with Dr. Albert Krueger"
## [22] "Syrup and the Ultimate Sweet"
## [23] "Juice Galaxy (formerly Juice World)"
## [24] "Baldi's Basics in Education and Learning"
## [25] "Lonely Wolf Treat"
## [26] "Raft"
## [27] "Devil Express"
## [28] "Lost Constellation"
## [29] "Perfumare"
## [30] "ULTRAKILL Prelude"
Rating Count
library(stringr) # since we're going to deal with character
<- html_nodes(itchio, ".game_rating")
rating_html rating_html
## {xml_nodeset (30)}
## [1] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [2] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [3] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [4] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [5] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [6] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [7] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [8] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [9] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [10] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [11] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [12] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [13] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [14] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [15] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [16] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [17] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [18] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [19] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## [20] <div class="game_rating">\n<div class="star_value">\n<div style="width: ...
## ...
<- c("[[:punct:]]") # prepare string to remove
string
<- html_text(rating_html) %>%
rating_count str_remove_all(pattern = string) %>%
str_squish() %>% as.numeric()
rating_count
## [1] 8961 2945 2157 3116 1911 3126 4377 1510 1604 1303 1036 1285 591 1670 678
## [16] 939 443 872 785 1065 701 866 546 1460 938 2304 416 697 393 464
Rating Score
<- itchio %>%
rating_score html_nodes("div") %>%
html_nodes(".game_rating") %>%
html_nodes("span") %>%
html_nodes(xpath = '//*[@class="rating_count"]') %>%
html_attr("title") %>%
as.numeric()
rating_score
## [1] 4.75 4.90 4.95 4.84 4.89 4.80 4.69 4.83 4.78 4.82 4.85 4.80 4.95 4.75 4.90
## [16] 4.81 4.95 4.82 4.83 4.77 4.83 4.78 4.86 4.68 4.75 4.60 4.89 4.79 4.90 4.87
Wrap All
<- data.frame(title_text, author_text, rating_score, rating_count) itchio_wrap
head(itchio_wrap)
## title_text author_text rating_score rating_count
## 1 Friday Night Funkin' ninjamuffin99 4.75 8961
## 2 Butterfly Soup Brianna Lei 4.90 2945
## 3 Our Life: Beginnings & Always GBPatch 4.95 2157
## 4 Project Kat Leef 6010 4.84 3116
## 5 Andromeda Six Wanderlust Games 4.89 1911
## 6 Doki Doki Literature Club! Team Salvato 4.80 3126
Visualization
# data aggregation
<- itchio_wrap %>%
itchio_arr filter(rating_count >= 10, rating_score >= 4.5) %>%
arrange(desc(rating_score, rating_count))
itchio_arr
## title_text author_text rating_score
## 1 Our Life: Beginnings & Always GBPatch 4.95
## 2 Blooming Panic robobarbie 4.95
## 3 Vincent: The Secret of Myers dino999z 4.95
## 4 Butterfly Soup Brianna Lei 4.90
## 5 Scout: An Apocalypse Story Anya 4.90
## 6 Perfumare PDRRook 4.90
## 7 Andromeda Six Wanderlust Games 4.89
## 8 Devil Express Bad Pet 4.89
## 9 ULTRAKILL Prelude Hakita 4.87
## 10 Juice Galaxy (formerly Juice World) fishlicka 4.86
## 11 Ebon Light Underbliss 4.85
## 12 Project Kat Leef 6010 4.84
## 13 Cinderella Phenomenon Dicesuki 4.83
## 14 her tears were my light NomnomNami 4.83
## 15 Therapy with Dr. Albert Krueger dino999z 4.83
## 16 Mindustry Anuke 4.82
## 17 one night, hot springs [jam ver.]-50% npckc 4.82
## 18 CHAINSAW DANCE DEMO Benedique 4.81
## 19 Doki Doki Literature Club! Team Salvato 4.80
## 20 missed messages. angela he 4.80
## 21 Lost Constellation Finji 4.79
## 22 Six Cats Under Team Bean Loop 4.78
## 23 Syrup and the Ultimate Sweet NomnomNami 4.78
## 24 Desktop Goose samperson 4.77
## 25 Friday Night Funkin' ninjamuffin99 4.75
## 26 Ravenfield (Beta 5) SteelRaven7 4.75
## 27 Lonely Wolf Treat NomnomNami 4.75
## 28 Sort the Court! Graeme Borland 4.69
## 29 Baldi's Basics in Education and Learning Basically Games 4.68
## 30 Raft Redbeet Interactive 4.60
## rating_count
## 1 2157
## 2 591
## 3 443
## 4 2945
## 5 678
## 6 393
## 7 1911
## 8 416
## 9 464
## 10 546
## 11 1036
## 12 3116
## 13 1510
## 14 785
## 15 701
## 16 1303
## 17 872
## 18 939
## 19 3126
## 20 1285
## 21 697
## 22 1604
## 23 866
## 24 1065
## 25 8961
## 26 1670
## 27 938
## 28 4377
## 29 1460
## 30 2304
# visualization
library(ggplot2)
<- ggplot(itchio_arr, aes(x=reorder(title_text, rating_score), y=rating_score)) +
plot geom_point(aes(size = rating_count, color = rating_score)) + coord_flip() +
labs(x = "",
y = "Score",
title = "Highest Rating Free Adventure-RPG Games on Itch.io",
subtitle = "Filtered for Windows Platform",
size = "Rating Count") +
scale_color_continuous(low = "pink", high = "maroon") +
scale_size_continuous(breaks = c(25,50,100,200)) +
guides(color = F) +
theme_minimal()
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
plot
Create Methods pada MongoDB Atlas
Pada pertemuan-pertemuan sebelumnya, sudah kita bahas Read Methods. Pada sesi ini, kita akan melakukan Input pada MongoDB Atlas.
Pada 2 Sesi di atas, Kita sudah memiliki 2 Dataframe, yaitu
::glimpse(rt) dplyr
## Rows: 4,400
## Columns: 92
## $ user_id <chr> "77944562", "888185595654062080", "83590831075…
## $ status_id <chr> "1465147173310545923", "1465147173125820416", …
## $ created_at <dttm> 2021-11-29 02:34:41, 2021-11-29 02:34:41, 202…
## $ screen_name <chr> "L_Skrubby05", "Abhai_BTTG", "WHATSPUPDAWG", "…
## $ text <chr> "@RNicKL5 STFU ARE YOU KIDDING!?", "@JeffEisen…
## $ source <chr> "Twitter for iPhone", "Twitter Web App", "Twit…
## $ display_text_width <dbl> 22, 7, 45, 110, 40, 223, 94, 75, 33, 246, 67, …
## $ reply_to_status_id <chr> "1465146944683229193", "1465146999184011268", …
## $ reply_to_user_id <chr> "37363167", "239575120", "865590564070227969",…
## $ reply_to_screen_name <chr> "RNicKL5", "JeffEisenband", "HaveYouMetTomu", …
## $ is_quote <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE…
## $ is_retweet <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ favorite_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ retweet_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ quote_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ reply_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ hashtags <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, <"BlackFr…
## $ symbols <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ urls_url <list> NA, NA, NA, NA, "twitter.com/booksarts1/sta…"…
## $ urls_t.co <list> NA, NA, NA, NA, "https://t.co/st24lfLiXU", NA…
## $ urls_expanded_url <list> NA, NA, NA, NA, "https://twitter.com/booksart…
## $ media_url <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, "http://p…
## $ media_t.co <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, "https://…
## $ media_expanded_url <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, "https://…
## $ media_type <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, "photo", …
## $ ext_media_url <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, "http://p…
## $ ext_media_t.co <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, "https://…
## $ ext_media_expanded_url <list> NA, NA, NA, NA, NA, NA, NA, NA, NA, "https://…
## $ ext_media_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ mentions_user_id <list> "37363167", "239575120", "865590564070227969"…
## $ mentions_screen_name <list> "RNicKL5", "JeffEisenband", "HaveYouMetTomu",…
## $ lang <chr> "en", "en", "en", "en", "en", "en", "en", "en"…
## $ quoted_status_id <chr> NA, NA, NA, NA, "1464617086139969541", NA, NA,…
## $ quoted_text <chr> NA, NA, NA, NA, "https://t.co/k5yEIryFng", NA,…
## $ quoted_created_at <dttm> NA, NA, NA, NA, 2021-11-27 15:28:19, NA, NA, …
## $ quoted_source <chr> NA, NA, NA, NA, "Twitter for iPhone", NA, NA, …
## $ quoted_favorite_count <int> NA, NA, NA, NA, 207, NA, NA, NA, NA, NA, 1797,…
## $ quoted_retweet_count <int> NA, NA, NA, NA, 48, NA, NA, NA, NA, NA, 588, N…
## $ quoted_user_id <chr> NA, NA, NA, NA, "1241404370484441090", NA, NA,…
## $ quoted_screen_name <chr> NA, NA, NA, NA, "BooksArts1", NA, NA, NA, NA, …
## $ quoted_name <chr> NA, NA, NA, NA, "Books & Arts", NA, NA, NA, NA…
## $ quoted_followers_count <int> NA, NA, NA, NA, 1892, NA, NA, NA, NA, NA, 5108…
## $ quoted_friends_count <int> NA, NA, NA, NA, 2117, NA, NA, NA, NA, NA, 1496…
## $ quoted_statuses_count <int> NA, NA, NA, NA, 9639, NA, NA, NA, NA, NA, 4062…
## $ quoted_location <chr> NA, NA, NA, NA, "", NA, NA, NA, NA, NA, "Washi…
## $ quoted_description <chr> NA, NA, NA, NA, "Voracious reader, bibliophile…
## $ quoted_verified <lgl> NA, NA, NA, NA, FALSE, NA, NA, NA, NA, NA, TRU…
## $ retweet_status_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_text <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_created_at <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ retweet_source <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_favorite_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_retweet_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_user_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_screen_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_followers_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_friends_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_statuses_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_location <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_description <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ retweet_verified <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ place_url <chr> "https://api.twitter.com/1.1/geo/id/bbb67f6528…
## $ place_name <chr> "Lido Beach", NA, NA, NA, "Richmond", NA, NA, …
## $ place_full_name <chr> "Lido Beach, NY", NA, NA, NA, "Richmond, VA", …
## $ place_type <chr> "city", NA, NA, NA, "city", NA, NA, NA, NA, NA…
## $ country <chr> "United States", NA, NA, NA, "United States", …
## $ country_code <chr> "US", NA, NA, NA, "US", NA, NA, NA, NA, NA, NA…
## $ geo_coords <list> <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, …
## $ coords_coords <list> <NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA, …
## $ bbox_coords <list> <-73.63760, -73.58373, -73.58373, -73.63760, …
## $ status_url <chr> "https://twitter.com/L_Skrubby05/status/146514…
## $ name <chr> "Leo Skorupski", "Абхай Савкар || Bowled Throu…
## $ location <chr> "Lido Beach, New York", "Santa Cruz, CA", "Tam…
## $ description <chr> "Started a twitter to publicly yell at profess…
## $ url <chr> NA, "https://t.co/0JAHSJ9Ywa", "https://t.co/R…
## $ protected <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ followers_count <int> 343, 435, 549, 12, 421, 5311, 30, 3418, 653, 6…
## $ friends_count <int> 1492, 1952, 336, 47, 205, 5685, 11, 792, 360, …
## $ listed_count <int> 5, 3, 2, 0, 15, 6, 1, 106, 5, 0, 2, 0, 0, 2, 1…
## $ statuses_count <int> 28257, 5157, 5483, 196, 22985, 118640, 13302, …
## $ favourites_count <int> 76858, 5581, 167475, 639, 19770, 266056, 4779,…
## $ account_created_at <dttm> 2009-09-28 06:35:28, 2017-07-20 23:55:22, 201…
## $ verified <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
## $ profile_url <chr> NA, "https://t.co/0JAHSJ9Ywa", "https://t.co/R…
## $ profile_expanded_url <chr> NA, "http://bowledthroughthegate.weebly.com", …
## $ account_lang <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ profile_banner_url <chr> "https://pbs.twimg.com/profile_banners/7794456…
## $ profile_background_url <chr> "http://abs.twimg.com/images/themes/theme1/bg.…
## $ profile_image_url <chr> "http://pbs.twimg.com/profile_images/146475876…
## $ lat <dbl> 40.59001, NA, NA, NA, 37.52988, NA, NA, NA, NA…
## $ lng <dbl> -73.61067, NA, NA, NA, -77.49317, NA, NA, NA, …
::glimpse(itchio_wrap) dplyr
## Rows: 30
## Columns: 4
## $ title_text <chr> "Friday Night Funkin'", "Butterfly Soup", "Our Life: Begi…
## $ author_text <chr> "ninjamuffin99", "Brianna Lei", "GBPatch", "Leef 6010", "…
## $ rating_score <dbl> 4.75, 4.90, 4.95, 4.84, 4.89, 4.80, 4.69, 4.83, 4.78, 4.8…
## $ rating_count <dbl> 8961, 2945, 2157, 3116, 1911, 3126, 4377, 1510, 1604, 130…
Dua dataframe tersebut, akan kita gunakan untuk Input ke MongoDB Atlas.
Membuat Koneksi
library(mongolite)
# This is the connection_string. You can get the exact url from your MongoDB cluster screen
= 'mongodb+srv://<username>:<password>@<cluster-name>.<code>.mongodb.net/admin?retryWrites=true&w=majority' connection_string
Membuat Database dan Collection
<- mongo(collection = "twitter", # Creating collection
twitter_collection db = "sample_dataset_R", # Creating DataBase
url = connection_string,
verbose = TRUE)
<- mongo(collection = "itchio", # Creating collection
itchio_collection db = "sample_dataset_R", # Creating DataBase
url = connection_string,
verbose = TRUE)
Proses Input
$insert(rt) twitter_collection
##
Processed 1000 rows...
Processed 2000 rows...
Processed 3000 rows...
Processed 4000 rows...
Complete! Processed total of 4400 rows.
## List of 5
## $ nInserted : num 4400
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
$insert(itchio_wrap) itchio_collection
##
Complete! Processed total of 30 rows.
## List of 5
## $ nInserted : num 30
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
Hasil
Berikut ini adalah hasil Input (Create Methods) ke MongoDB Atlas.
Referensi
Fauziyyah, NA. 2019. Web Scraping in R using rvest [Diakses Online : 29 November 2021]. https://rpubs.com/nabiilahardini/itchio
Badan Informasi Geospasial, abdul.aziz@big.go.id↩︎