Dua jenis varian rasa yang banyak ditemukan pada makanan dan minuman di antaranya adalah rasa coklat dan keju. Rasa coklat berasal dari olahan biji kakao yang banyak digemari oleh orang-orang. Coklat pun sering dijadikan bingkisan atau hadiah bagi orang terspesial. Bahkan, menurut penelitian coklat dapat meningkatkan mood seseorang yang mengonsumsinya. Berbeda dengan coklat, keju diperoleh melalui proses fermentasi susu. Tidak kalah dengan coklat, keju juga banyak digemari oleh pecinta kuliner. Perasa coklat dan keju banyak dijadikan varian dalam aneka makanan dan minuman yang beredar di pasaran sekarang. Makanan atau minuman apa sajakah yang saat ini digemari banyak orang dengan varian rasa coklat atau keju? Mari kita lihat hasil kecenderungan rasa coklat dan keju sebagai rasa paling favorit melalui status-status yang ditulis dalam media sosial Twitter.
# Load packages
library(rtweet)
library(tidyverse)
# Twitter authentication
create_token(
app = "my_twitter_research_app",
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret = access_secret)
## <Token>
## <oauth_endpoint>
## request: https://api.twitter.com/oauth/request_token
## authorize: https://api.twitter.com/oauth/authenticate
## access: https://api.twitter.com/oauth/access_token
## <oauth_app> my_twitter_research_app
## key: jQ22IrA6bkJZMLrFJb4W9jF9d
## secret: <hidden>
## <credentials> oauth_token, oauth_token_secret
## ---
# Retrieve tweets
tweets1 <- search_tweets("Coklat", n = 15000, tweet_mode="extended")
## Searching for tweets...
## This may take a few seconds...
## Finished collecting tweets!
tweets1 <- distinct(tweets1, text, .keep_all=TRUE)
tweets2 <- search_tweets("Keju", n = 15000, tweet_mode="extended")
## Searching for tweets...
## This may take a few seconds...
## Warning: Rate limit exceeded - 88
## Warning: Rate limit exceeded
## Finished collecting tweets!
tweets2 <- distinct(tweets2, text, .keep_all=TRUE)
## plot time series of tweets1
ts_plot(tweets1, "1 hours") +
theme_minimal() +
theme(plot.title = ggplot2::element_text(face = "bold")) +
labs(
x = NULL, y = NULL,
title = "Intensitas Tweet Mengandung Kata 'Coklat'",
subtitle = "Menggunakan interval per jam",
caption = "\nSumber: Diperoleh dari Twitter's REST API melalui rtweet"
)
Intensitas tweet yang mengandung kata ‘coklat’ muncul secara berpola setiap harinya. Rata-rata para pengguna Twitter akan sering mengunggah sebuah tweet yang mengandung kata cokelat pada waktu siang hari. Sementara, menjelang malam hari, topik tweet mengenai coklat jarang dibagikan melalui akun Twitternya. Terdapat intervensi atau outlier dimana sekitar jam 10 pagi di tanggal 9 November, banyak pengguna Twitter yang mengunggah tweet mengenai coklat. Hal ini dapat bisa saja dikarenakan ada promosi makanan atau minuman dengan varian rasa coklat ataupun penyebab lainnya. Perlu ditelusuri lebih lanjut mengapa banyak tweet mengenai coklat muncul pada waktu tersebut.
## plot time series of tweets2
ts_plot(tweets2, "1 hours") +
theme_minimal() +
theme(plot.title = ggplot2::element_text(face = "bold")) +
labs(
x = NULL, y = NULL,
title = "Intensitas Tweet Mengandung Kata 'Keju'",
subtitle = "Menggunakan interval per jam",
caption = "\nSumber: Diperoleh dari Twitter's REST API melalui rtweet"
)
Intensitas tweet yang mengandung kata ‘keju’ juga muncul secara berpola setiap harinya. Sama seperti ‘coklat’, rata-rata para pengguna Twitter akan sering mengunggah sebuah tweet yang mengandung kata keju pada waktu siang hari. Menjelang malam hari, topik tweet mengenai keju jarang ditulis melalui Tweet. Terdapat intervensi atau outlier dimana sekitar siang hari di tanggal 10 November, banyak sekali pengguna Twitter yang menuliskan tweet mengenai keju. Hal ini dapat bisa saja dikarenakan ada promosi makanan atau minuman dengan varian rasa keju ataupun penyebab lainnya. Perlu ditelusuri lebih lanjut mengapa banyak tweet mengenai keju muncul pada waktu tersebut.
tail(tweets1, 20)
## # A tibble: 20 x 88
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 1027415~ 10598140~ 2018-11-06 14:25:32 homie_hams @educationfe~ Twitt~
## 2 1130266~ 10598138~ 2018-11-06 14:24:54 mhazman_ @Zaf_Rulez s~ Twitt~
## 3 1011865~ 10598137~ 2018-11-06 14:24:21 ddaegook Bosen sama ~ Twitt~
## 4 75987625 10598135~ 2018-11-06 14:23:36 ijusijus Hari ini mat~ Twitt~
## 5 9405153~ 10598135~ 2018-11-06 14:23:27 gitapuspit~ Kapan terakh~ Ask.fm
## 6 2301094~ 10598133~ 2018-11-06 14:22:50 galih_near @Fidly_JKT48~ Twitt~
## 7 4406472~ 10598133~ 2018-11-06 14:22:43 noraliyaaaa Mata dah kuy~ Twitt~
## 8 1046072~ 10598131~ 2018-11-06 14:22:10 leetaeyyou~ "Teman teman~ Twitt~
## 9 1011964~ 10598131~ 2018-11-06 14:22:00 Harbringer~ "@eastpetals~ Twitt~
## 10 1217038~ 10598127~ 2018-11-06 14:20:34 ShafiqahMa~ Belum tentu ~ Twitt~
## 11 1004416~ 10598124~ 2018-11-06 14:19:02 damprad @potatota da~ Twitt~
## 12 9060256~ 10598122~ 2018-11-06 14:18:16 Li_lifiola "sdg minum f~ Twitt~
## 13 39239431 10598121~ 2018-11-06 14:18:01 stroberi_o~ "@priwietkie~ Twitt~
## 14 1016251~ 10598120~ 2018-11-06 14:17:47 yjryys @koreanthing~ Twitt~
## 15 1358500~ 10598119~ 2018-11-06 14:17:07 Lisdakh99 "Kenapa tayo~ Twitt~
## 16 1428902~ 10598115~ 2018-11-06 14:15:34 Yuska_fiqr~ "Kalo lagi b~ Insta~
## 17 1282448~ 10598115~ 2018-11-06 14:15:30 sheryambar "Kangen bau ~ Twitt~
## 18 2736287~ 10598114~ 2018-11-06 14:15:09 bunnytaes "Ada cowok y~ Twitt~
## 19 2385443~ 10598111~ 2018-11-06 14:14:00 xonayson Weh penatnya~ Twitt~
## 20 8415928~ 10598109~ 2018-11-06 14:13:19 anaqgadis @womanfeeds ~ Twitt~
## # ... with 82 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, hashtags <list>,
## # symbols <list>, urls_url <list>, urls_t.co <list>,
## # urls_expanded_url <list>, media_url <list>, media_t.co <list>,
## # media_expanded_url <list>, media_type <list>, ext_media_url <list>,
## # ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>,
## # mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>,
## # quoted_text <chr>, quoted_created_at <dttm>, quoted_source <chr>,
## # quoted_favorite_count <int>, quoted_retweet_count <int>,
## # quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>,
## # quoted_followers_count <int>, quoted_friends_count <int>,
## # quoted_statuses_count <int>, quoted_location <chr>,
## # quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>,
## # retweet_created_at <dttm>, retweet_source <chr>,
## # retweet_favorite_count <int>, retweet_retweet_count <int>,
## # retweet_user_id <chr>, retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>,
## # country <chr>, country_code <chr>, geo_coords <list>,
## # coords_coords <list>, bbox_coords <list>, status_url <chr>,
## # name <chr>, location <chr>, description <chr>, url <chr>,
## # protected <lgl>, followers_count <int>, friends_count <int>,
## # listed_count <int>, statuses_count <int>, favourites_count <int>,
## # account_created_at <dttm>, verified <lgl>, profile_url <chr>,
## # profile_expanded_url <chr>, account_lang <chr>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
tail(tweets2, 20)
## # A tibble: 20 x 88
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 7330794~ 106070922~ 2018-11-09 01:42:41 ShintaPusp~ ngopi sama ~ Twitt~
## 2 3345166~ 106070828~ 2018-11-09 01:38:56 AridaKarti~ @devialfian~ Twitt~
## 3 1832177~ 106070575~ 2018-11-09 01:28:54 selvi_sr @Genfm woyy~ Mobil~
## 4 1033620~ 106070543~ 2018-11-09 01:27:37 HIHOLLY97 "@HIKJISOO9~ Twitt~
## 5 1033620~ 106070311~ 2018-11-09 01:18:25 HIHOLLY97 "@HIKJISOO9~ Twitt~
## 6 1033620~ 106069330~ 2018-11-09 00:39:25 HIHOLLY97 @HIKJISOO95~ Twitt~
## 7 1033620~ 106070066~ 2018-11-09 01:08:39 HIHOLLY97 @HIKJISOO95~ Twitt~
## 8 2729667~ 106069909~ 2018-11-09 01:02:26 aboyrasara~ "Assallamua~ Faceb~
## 9 1285562~ 106069345~ 2018-11-09 00:40:01 jeahyonct @VampireYao~ Twitt~
## 10 1611635~ 106069014~ 2018-11-09 00:26:52 infoSerang "Kedai Kopi~ Insta~
## 11 7671965~ 106068785~ 2018-11-09 00:17:45 gicft @cxvilw Itu~ Twitt~
## 12 1059988~ 106068683~ 2018-11-09 00:13:43 Mohfaizins~ @Nisa_safiy~ Twitt~
## 13 7410530~ 106068676~ 2018-11-09 00:13:25 yabiukinako @chaemuda r~ Twitt~
## 14 44966360 106068621~ 2018-11-09 00:11:16 Rinikusuma "KetanPunel~ Insta~
## 15 63057886 106068069~ 2018-11-08 23:49:19 triantorah~ "@infoJATIA~ Twitt~
## 16 1055823~ 106067820~ 2018-11-08 23:39:26 SK_Yongji91 "@SK_KwonYu~ Twitt~
## 17 1020333~ 106066297~ 2018-11-08 22:38:54 ncthaechand hai, aku le~ Twitt~
## 18 2730852~ 106066254~ 2018-11-08 22:37:10 madalenade~ "Gini nih k~ Twitt~
## 19 1433543~ 106066217~ 2018-11-08 22:35:42 nurmalaset~ daripada ra~ Twitt~
## 20 2968845~ 106066187~ 2018-11-08 22:34:32 wkwkmulu "@rlthingy ~ Twitt~
## # ... with 82 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, hashtags <list>,
## # symbols <list>, urls_url <list>, urls_t.co <list>,
## # urls_expanded_url <list>, media_url <list>, media_t.co <list>,
## # media_expanded_url <list>, media_type <list>, ext_media_url <list>,
## # ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>,
## # mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>,
## # quoted_text <chr>, quoted_created_at <dttm>, quoted_source <chr>,
## # quoted_favorite_count <int>, quoted_retweet_count <int>,
## # quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>,
## # quoted_followers_count <int>, quoted_friends_count <int>,
## # quoted_statuses_count <int>, quoted_location <chr>,
## # quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>,
## # retweet_created_at <dttm>, retweet_source <chr>,
## # retweet_favorite_count <int>, retweet_retweet_count <int>,
## # retweet_user_id <chr>, retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>,
## # country <chr>, country_code <chr>, geo_coords <list>,
## # coords_coords <list>, bbox_coords <list>, status_url <chr>,
## # name <chr>, location <chr>, description <chr>, url <chr>,
## # protected <lgl>, followers_count <int>, friends_count <int>,
## # listed_count <int>, statuses_count <int>, favourites_count <int>,
## # account_created_at <dttm>, verified <lgl>, profile_url <chr>,
## # profile_expanded_url <chr>, account_lang <chr>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
library(tm)
## Warning: package 'tm' was built under R version 3.5.1
## Loading required package: NLP
## Warning: package 'NLP' was built under R version 3.5.1
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
#===1===
# build a corpus, and specify the source to be character vectors
myCorpus1 <- Corpus(VectorSource(tweets1$text))
# convert to lower case
myCorpus1 <- tm_map(myCorpus1, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(myCorpus1, content_transformer(tolower)):
## transformation drops documents
# remove URLs
removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
myCorpus1 <- tm_map(myCorpus1, content_transformer(removeURL))
## Warning in tm_map.SimpleCorpus(myCorpus1, content_transformer(removeURL)):
## transformation drops documents
# remove anything other than English letters or space
removeNumPunct <- function(x) gsub("[^[:alpha:][:space:]]*", "", x)
myCorpus1 <- tm_map(myCorpus1, content_transformer(removeNumPunct))
## Warning in tm_map.SimpleCorpus(myCorpus1,
## content_transformer(removeNumPunct)): transformation drops documents
#===2===
# build a corpus, and specify the source to be character vectors
myCorpus2 <- Corpus(VectorSource(tweets2$text))
# convert to lower case
myCorpus2 <- tm_map(myCorpus2, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(myCorpus2, content_transformer(tolower)):
## transformation drops documents
# remove URLs
removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
myCorpus2 <- tm_map(myCorpus2, content_transformer(removeURL))
## Warning in tm_map.SimpleCorpus(myCorpus2, content_transformer(removeURL)):
## transformation drops documents
# remove anything other than English letters or space
removeNumPunct <- function(x) gsub("[^[:alpha:][:space:]]*", "", x)
myCorpus2 <- tm_map(myCorpus2, content_transformer(removeNumPunct))
## Warning in tm_map.SimpleCorpus(myCorpus2,
## content_transformer(removeNumPunct)): transformation drops documents
# remove stopwords
myStopwords <- c(setdiff(stopwords('english'), c("r", "big")), "use", "see", "used", "via", "amp", "keju", "coklat", "sih", "ga", "dah", "pas", "k", "u", "je", "jd", "tu", "ni", "lg", "ya", "iya", "gitu", "tuh", "gak", "kalo", "ku", "nih", "kek", "nak", "tau", "tp", "kayak", "udah", "tau")
stopwords_id <- read.table('d:/stopwords-id.txt', header = FALSE)
myStopwords <- c(myStopwords, as.matrix(stopwords_id$V1), "hi", "yg")
myCorpus1 <- tm_map(myCorpus1, removeWords, myStopwords)
## Warning in tm_map.SimpleCorpus(myCorpus1, removeWords, myStopwords):
## transformation drops documents
myCorpus2 <- tm_map(myCorpus2, removeWords, myStopwords)
## Warning in tm_map.SimpleCorpus(myCorpus2, removeWords, myStopwords):
## transformation drops documents
#===1===
# remove extra whitespace
myCorpus1 <- tm_map(myCorpus1, stripWhitespace)
## Warning in tm_map.SimpleCorpus(myCorpus1, stripWhitespace): transformation
## drops documents
# keep a copy for stem completion later
myCorpusCopy1 <- myCorpus1
#===2===
# remove extra whitespace
myCorpus2 <- tm_map(myCorpus2, stripWhitespace)
## Warning in tm_map.SimpleCorpus(myCorpus2, stripWhitespace): transformation
## drops documents
# keep a copy for stem completion later
myCorpusCopy2 <- myCorpus2
tdm1 <- TermDocumentMatrix(myCorpus1, control = list(wordLengths = c(1, Inf)))
tdm2 <- TermDocumentMatrix(myCorpus2, control = list(wordLengths = c(1, Inf)))
tdm1
## <<TermDocumentMatrix (terms: 24253, documents: 9127)>>
## Non-/sparse entries: 93752/221263379
## Sparsity : 100%
## Maximal term length: 53
## Weighting : term frequency (tf)
tdm2
## <<TermDocumentMatrix (terms: 7521, documents: 1841)>>
## Non-/sparse entries: 19492/13826669
## Sparsity : 100%
## Maximal term length: 53
## Weighting : term frequency (tf)
freq.terms1 <- findFreqTerms(tdm1, lowfreq = 150)
freq.terms2 <- findFreqTerms(tdm2, lowfreq = 80)
freq.terms1[1:40]
## [1] "bikin" "banget" "putih" "suka" "pisang" "baju"
## [7] "orang" "makan" "warna" "hitam" "gue" "kasih"
## [13] "nya" "aja" "minum" "panas" "enak" "roti"
## [19] "tua" "manis" "pake" "martabak" "beli" "harga"
## [25] "la" "deh" "kopi" "rlthingy" "susu" "bgt"
## [31] "bgcoklat" "permen" "pengen" "rambut" "biru" "merah"
## [37] "biar" "es" "muda" "wa"
freq.terms2[1:40]
## [1] "enak" "makan" "pake" "martabak" "pisang" "beli"
## [7] "suka" "roti" "nya" "susu" "ayam" "aja"
## [13] NA NA NA NA NA NA
## [19] NA NA NA NA NA NA
## [25] NA NA NA NA NA NA
## [31] NA NA NA NA NA NA
## [37] NA NA NA NA
memory.limit()
## [1] 3509
memory.limit(size = 35000)
## [1] 35000
term.freq1 <- rowSums(as.matrix(tdm1))
term.freq1 <- subset(term.freq1, term.freq1 >= 100)
df1 <- data.frame(term = names(term.freq1), freq = term.freq1)
term.freq2 <- rowSums(as.matrix(tdm2))
term.freq2 <- subset(term.freq2, term.freq2 >= 100)
df2 <- data.frame(term = names(term.freq2), freq = term.freq2)
ggplot(df1, aes(x=term, y=freq)) + geom_bar(stat="identity") +
xlab("Terms") + ylab("Count") + coord_flip() +
theme(axis.text=element_text(size=7))
ggplot(df2, aes(x=term, y=freq)) + geom_bar(stat="identity") +
xlab("Terms") + ylab("Count") + coord_flip() +
theme(axis.text=element_text(size=7))
library(wordcloud)
## Warning: package 'wordcloud' was built under R version 3.5.1
## Loading required package: RColorBrewer
m <- as.matrix(tdm1)
# calculate the frequency of words and sort it by frequency
word.freq1 <- sort(rowSums(m), decreasing = T)
# colors
pal <- brewer.pal(9, "BuGn")[-(1:4)]
n <- as.matrix(tdm2)
# calculate the frequency of words and sort it by frequency
word.freq2 <- sort(rowSums(n), decreasing = T)
# colors
pal <- brewer.pal(9, "BuGn")[-(1:4)]
wordcloud(words = names(word.freq1), freq = word.freq1, min.freq = 100,
random.order = F, colors = pal)
Dari wordcloud di atas, diperoleh informasi bahwa mayoritas pengguna twitter menuliskan kata “MAKAN” bersamaan dengan kata “COKLAT”" melalui tweet yang dibagikan. Selain itu, kata-kata yang sering muncul bersamaan kata coklat lainnya adalah warna, susu, aja, suka, dll. Jenis makanan tidak menjadi kata yang paling sering muncul karena kata coklat adalah kata yang mengacu pada 2 makna yaitu rasa dan warna.
wordcloud(words = names(word.freq2), freq = word.freq2, min.freq = 100,
random.order = F, colors = pal)
Dari wordcloud di atas, diperoleh informasi bahwa mayoritas pengguna twitter menuliskan kata “MAKAN” bersamaan dengan kata “KEJU” melalui tweet yang dibagikan. Selain itu, kata-kata yang sering muncul bersamaan kata keju lainnya adalah susu, martabak, enak, pisang, pake, roti, dll. Hal ini menunjukkan bahwa banyak makanan rasa keju yang disukai oleh pengguna Twitter.
Para pengguna Twitter banyak menyebutkan nama-nama makanan dengan rasa keju dibandingkan dengan rasa coklat.