Liga Inggris dan Liga Spanyol menjadi liga terbaik di bumi ini. Bagaimana tidak? mulai dari pemain, pelatih, dan pertandingan yang epik menjadi magnet para penggemar bola diseluruh dunia. Banyak media sosial yang digunakan untuk membicarakan kedua liga ini, salah satunya adalah TWITTER. Berikut adalah beberapa hasil analisis text minning di twitter.
# Load packages
library(rtweet)
library(tidyverse)
# Twitter authentication
create_token(
app = "my_twitter_research_app",
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret = access_secret)
## <Token>
## <oauth_endpoint>
## request: https://api.twitter.com/oauth/request_token
## authorize: https://api.twitter.com/oauth/authenticate
## access: https://api.twitter.com/oauth/access_token
## <oauth_app> my_twitter_research_app
## key: zXMq2IaQDXtXeTmrCYb0k8ym2
## secret: <hidden>
## <credentials> oauth_token, oauth_token_secret
## ---
# Retrieve tweets
tweets <- search_tweets("Liga Inggris", n = 10000, tweet_mode="extended")
## Searching for tweets...
## Finished collecting tweets!
tweets <- distinct(tweets, text, .keep_all=TRUE)
# Retrieve tweets
tweetss <- search_tweets("Liga Spanyol", n = 10000, tweet_mode="extended")
## Searching for tweets...
## Finished collecting tweets!
tweetss <- distinct(tweetss, text, .keep_all=TRUE)
## plot time series of tweets
ts_plot(tweets, "3 hours") +
theme_minimal() +
theme(plot.title = ggplot2::element_text(face = "bold")) +
labs(
x = NULL, y = NULL,
title = "Frequency of Liga Inggris Twitter statuses",
subtitle = "Twitter status (tweet) counts aggregated using three-hour intervals",
caption = "\nSource: Data collected from Twitter's REST API via rtweet"
)
## plot time series of tweets
ts_plot(tweetss, "3 hours") +
theme_minimal() +
theme(plot.title = ggplot2::element_text(face = "bold")) +
labs(
x = NULL, y = NULL,
title = "Frequency of Liga Sapnyol Twitter statuses",
subtitle = "Twitter status (tweet) counts aggregated using three-hour intervals",
caption = "\nSource: Data collected from Twitter's REST API via rtweet"
)
Grafik diatas menunjukkan frekuensi kata “Liga Inggris” dan “Liga Spanyol” ditulis oleh para warganet. Diketahui grafik diatas menampilkan data frekuensi dari tanggal 3 November sampai 11 November. Persamaan kedua grafik diatas adalah terjadi peningkatan yang cukup signifikan pada tanggal 4 dan 11 November. Hal ini disebabkan adanya big match club raksasa di masing-masing Liga. Pertandingan Liga Inggris pada tanggal 4 November 2018 mempertemukan Arsenal VS Man.City dan Chelsea VS Man.United, sedangkan tanggal 11 November 2018 yang paling menyita perhatian pecinta bola adalah pertandingan DERBY MANCHESTER, yaitu Manchester City VS Manchester United. Selain derby manchester club raksasa lain yang sedang bertanding adalah Arsenal VS Wolves. Pada Liga Spanyol Barcelona dan Real Madrid juga bertanding menghadapi lawannya.
Dari kedua grafik diatas dapat disimpulkan bahwa terjadi kenaikan frekuensi tweet “Liga Inggris” maupun “Liga Spanyol” saat club raksasa masing-masing negara tengah bertanding.
tail(tweets, 5)
## # A tibble: 5 x 88
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 245216~ 10583560~ 2018-11-02 13:52:08 totosudarm~ Emer~ dlvr.~
## 2 185533~ 10583560~ 2018-11-02 13:52:06 sonardi_ro~ Emer~ dlvr.~
## 3 128468~ 10583500~ 2018-11-02 13:28:03 Geol_Goal FPL ~ dlvr.~
## 4 535516~ 10583475~ 2018-11-02 13:18:04 KnuckleHea~ Goal~ dlvr.~
## 5 585303~ 10583429~ 2018-11-02 12:59:59 muhperi_sa~ @beg~ Twitt~
## # ... with 82 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, hashtags <list>,
## # symbols <list>, urls_url <list>, urls_t.co <list>,
## # urls_expanded_url <list>, media_url <list>, media_t.co <list>,
## # media_expanded_url <list>, media_type <list>, ext_media_url <list>,
## # ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>,
## # mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>,
## # quoted_text <chr>, quoted_created_at <dttm>, quoted_source <chr>,
## # quoted_favorite_count <int>, quoted_retweet_count <int>,
## # quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>,
## # quoted_followers_count <int>, quoted_friends_count <int>,
## # quoted_statuses_count <int>, quoted_location <chr>,
## # quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>,
## # retweet_created_at <dttm>, retweet_source <chr>,
## # retweet_favorite_count <int>, retweet_retweet_count <int>,
## # retweet_user_id <chr>, retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>,
## # country <chr>, country_code <chr>, geo_coords <list>,
## # coords_coords <list>, bbox_coords <list>, status_url <chr>,
## # name <chr>, location <chr>, description <chr>, url <chr>,
## # protected <lgl>, followers_count <int>, friends_count <int>,
## # listed_count <int>, statuses_count <int>, favourites_count <int>,
## # account_created_at <dttm>, verified <lgl>, profile_url <chr>,
## # profile_expanded_url <chr>, account_lang <chr>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
tail(tweetss, 5)
## # A tibble: 5 x 88
## user_id status_id created_at screen_name text source
## <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 276006~ 10583393~ 2018-11-02 12:45:44 aditrimasr~ "#So~ Twitt~
## 2 280128~ 10583339~ 2018-11-02 12:24:06 sekelas_gw Jadw~ dlvr.~
## 3 120661~ 10583330~ 2018-11-02 12:20:26 indhk @Nov~ Twitt~
## 4 234015~ 10583289~ 2018-11-02 12:04:08 suaradotcom Jadw~ dlvr.~
## 5 163193~ 10583288~ 2018-11-02 12:04:03 EPras92 "Lov~ Twitt~
## # ... with 82 more variables: display_text_width <dbl>,
## # reply_to_status_id <chr>, reply_to_user_id <chr>,
## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## # favorite_count <int>, retweet_count <int>, hashtags <list>,
## # symbols <list>, urls_url <list>, urls_t.co <list>,
## # urls_expanded_url <list>, media_url <list>, media_t.co <list>,
## # media_expanded_url <list>, media_type <list>, ext_media_url <list>,
## # ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>,
## # mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>,
## # quoted_text <chr>, quoted_created_at <dttm>, quoted_source <chr>,
## # quoted_favorite_count <int>, quoted_retweet_count <int>,
## # quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>,
## # quoted_followers_count <int>, quoted_friends_count <int>,
## # quoted_statuses_count <int>, quoted_location <chr>,
## # quoted_description <chr>, quoted_verified <lgl>,
## # retweet_status_id <chr>, retweet_text <chr>,
## # retweet_created_at <dttm>, retweet_source <chr>,
## # retweet_favorite_count <int>, retweet_retweet_count <int>,
## # retweet_user_id <chr>, retweet_screen_name <chr>, retweet_name <chr>,
## # retweet_followers_count <int>, retweet_friends_count <int>,
## # retweet_statuses_count <int>, retweet_location <chr>,
## # retweet_description <chr>, retweet_verified <lgl>, place_url <chr>,
## # place_name <chr>, place_full_name <chr>, place_type <chr>,
## # country <chr>, country_code <chr>, geo_coords <list>,
## # coords_coords <list>, bbox_coords <list>, status_url <chr>,
## # name <chr>, location <chr>, description <chr>, url <chr>,
## # protected <lgl>, followers_count <int>, friends_count <int>,
## # listed_count <int>, statuses_count <int>, favourites_count <int>,
## # account_created_at <dttm>, verified <lgl>, profile_url <chr>,
## # profile_expanded_url <chr>, account_lang <chr>,
## # profile_banner_url <chr>, profile_background_url <chr>,
## # profile_image_url <chr>
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
library(ggplot2)
# build a corpus, and specify the source to be character vectors
myCorpus <- Corpus(VectorSource(tweets$text))
# convert to lower case
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(myCorpus, content_transformer(tolower)):
## transformation drops documents
# remove URLs
removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
myCorpus <- tm_map(myCorpus, content_transformer(removeURL))
## Warning in tm_map.SimpleCorpus(myCorpus, content_transformer(removeURL)):
## transformation drops documents
# remove anything other than English letters or space
removeNumPunct <- function(x) gsub("[^[:alpha:][:space:]]*", "", x)
myCorpus <- tm_map(myCorpus, content_transformer(removeNumPunct))
## Warning in tm_map.SimpleCorpus(myCorpus,
## content_transformer(removeNumPunct)): transformation drops documents
# remove stopwords
myStopwords <- c(setdiff(stopwords('english'), c("r", "big")), "use", "see", "used", "via", "amp", "indihome")
stopwords_id <- read.table('H:/stopwords-id.txt', header = FALSE)
myStopwords <- c(myStopwords, as.matrix(stopwords_id$V1), "hi", "yg")
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
## Warning in tm_map.SimpleCorpus(myCorpus, removeWords, myStopwords):
## transformation drops documents
# remove extra whitespace
myCorpus <- tm_map(myCorpus, stripWhitespace)
## Warning in tm_map.SimpleCorpus(myCorpus, stripWhitespace): transformation
## drops documents
# keep a copy for stem completion later
myCorpusCopy <- myCorpus
myCorpuss <- Corpus(VectorSource(tweetss$text))
myCorpuss <- tm_map(myCorpuss, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(myCorpuss, content_transformer(tolower)):
## transformation drops documents
removeURLs <- function(x) gsub("http[^[:space:]]*", "", x)
myCorpuss <- tm_map(myCorpuss, content_transformer(removeURLs))
## Warning in tm_map.SimpleCorpus(myCorpuss, content_transformer(removeURLs)):
## transformation drops documents
removeNumPuncts <- function(x) gsub("[^[:alpha:][:space:]]*", "", x)
myCorpuss <- tm_map(myCorpuss, content_transformer(removeNumPuncts))
## Warning in tm_map.SimpleCorpus(myCorpuss,
## content_transformer(removeNumPuncts)): transformation drops documents
myStopwordss <- c(setdiff(stopwords('english'), c("r", "big")), "use", "see", "used", "via", "amp", "indihome")
stopwords_ids <- read.table('H:/stopwords-id.txt', header = FALSE)
myStopwordss <- c(myStopwordss, as.matrix(stopwords_id$V1), "hi", "yg")
myCorpuss <- tm_map(myCorpuss, removeWords, myStopwordss)
## Warning in tm_map.SimpleCorpus(myCorpuss, removeWords, myStopwordss):
## transformation drops documents
myCorpuss <- tm_map(myCorpuss, stripWhitespace)
## Warning in tm_map.SimpleCorpus(myCorpuss, stripWhitespace): transformation
## drops documents
myCorpusCopys <- myCorpuss
tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1, Inf)))
tdms <- TermDocumentMatrix(myCorpuss, control = list(wordLengths = c(1, Inf)))
tdm
## <<TermDocumentMatrix (terms: 4368, documents: 3972)>>
## Non-/sparse entries: 34601/17315095
## Sparsity : 100%
## Maximal term length: 32
## Weighting : term frequency (tf)
tdms
## <<TermDocumentMatrix (terms: 1566, documents: 1145)>>
## Non-/sparse entries: 9902/1783168
## Sparsity : 99%
## Maximal term length: 27
## Weighting : term frequency (tf)
freq.terms <- findFreqTerms(tdm, lowfreq = 20)
freq.terms[1:50]
## [1] "inggris" "klub" "liga" "masuk" "malam"
## [6] "minggu" "sepakbola" "siaran" "bermain" "hasil"
## [11] "klasemen" "pekan" "primer" "arsenal" "liverpool"
## [16] "null" "city" "derby" "laga" "manchester"
## [21] "martial" "mourinho" "rashford" "chelsea" "rekor"
## [26] "terulang" "juara" "laju" "menjaga" "sarri"
## [31] "hotspur" "palace" "poin" "tottenham" "ars"
## [36] "gagal" "gunners" "kalahkan" "menundukkan" "reds"
## [41] "sulit" "gol" "guardiola" "menang" "pesta"
## [46] "puas" "babak" "southampton" "unggul" "aguero"
freq.termss <- findFreqTerms(tdms, lowfreq = 20)
freq.termss[1:50]
## [1] "camp" "nou" "lionel" "messi" "pemain"
## [6] "piala" "comeback" "lawan" "solari" "madrid"
## [11] "real" "blancos" "los" "liga" "menang"
## [16] "santiago" "barcelona" "barca" "rayo" "detiksport"
## [21] "betis" "main" "pulih" "puas" "benzema"
## [26] "gol" "atletico" "bilbao" "dramatis" "hasil"
## [31] "spanyol" "celta" "vigo" "vs" "pekan"
## [36] "wenger" "pelatih" "kalah" "laga" "chelsea"
## [41] "bale" "vinicius" "kemenangan" "bermain" "ramos"
## [46] "vallecano" "tim" "leganes" "valladolid" "puasa"
term.freq <- rowSums(as.matrix(tdm))
term.freq <- subset(term.freq, term.freq >= 200)
df <- data.frame(term = names(term.freq), freq = term.freq)
term.freqs <- rowSums(as.matrix(tdms))
term.freqs <- subset(term.freqs, term.freqs >= 200)
dfs <- data.frame(term = names(term.freqs), freq = term.freqs)
ggplot(df, aes(x=reorder(term,freq), y=freq)) + geom_bar(stat="identity") +
xlab("Terms") + ylab("Count") + coord_flip() +
theme(axis.text=element_text(size=7))
ggplot(dfs, aes(x=reorder(term,freq), y=freq)) + geom_bar(stat="identity") +
xlab("Terms") + ylab("Count") + coord_flip() +
theme(axis.text=element_text(size=7))
Diagram batang diatas menunjukkan kata yang paling sering muncul bersamaan dengan kata “Liga Inggris” dan “Liga Spanyol”. Dari gambar diatas diketahui kata yang sering muncul selain liga inggris dan liga spanyol itu sendiri adalah nama-nama club. Pada liga inggris sendiri kata yang sering muncul seperti manchester city dan united, liverpool, chelsea, arsenal. Tidak mengherankan nama-nama yang sering muncul adalah nama-nama club raksasa liga inggris, notabene sebagian besar mereka menempati 5 besar klasemen sementara liga inggris musim ini, sehingga sangat seru untuk diperbincangkan. Hal yang sama terjadi pada liga spanyol.Namun, terdapat perbedaan yang mencolok antara tweet mengenai liga inggris dan liga spanyol. Tweet yang berkaitan dengan liga inggris lebih banyak variasi kata yang muncul dibandingkan dengan tweet yang berkaitan dengan liga spanyol.
Dapat disimpulkan bahwa warganet yang berselancar pada dunia maya twitter lebih tertarik dengan liga inggris dibandingkan dengan liga spanyol.
library(wordcloud)
## Loading required package: RColorBrewer
library(RColorBrewer)
m <- as.matrix(tdm)
# calculate the frequency of words and sort it by frequency
word.freq <- sort(rowSums(m), decreasing = T)
# colors
pal <- brewer.pal(9, "BuGn")[-(1:4)]
ms <- as.matrix(tdms)
# calculate the frequency of words and sort it by frequency
word.freqs <- sort(rowSums(ms), decreasing = T)
# colors
pals <- brewer.pal(9, "BuGn")[-(1:4)]
wordcloud(words = names(word.freq), freq = word.freq, min.freq = 50,
random.order = F, colors = 'red')
wordcloud(words = names(word.freqs), freq = word.freq, min.freq = 50,
random.order = F, colors = 'blue')
Gambar diatas merupakan bentuk visual dari kata-kata yang sering muncul bersamaan dengan kata “Liga Inggris” dan “Liga Spanyol”. Kata yang muncul pada gambar memiliki frekuensi lebih dari 50. Diketahui bahwa selain nama-nama club di liga inggris, nama pelatih pun turut menjadi bagian. Tampak pelatih Manchester United mourinho dan pelatih Manchester City guardiola tercantum dalam wordcloud. Dua club ini memang memiliki perhatian tersendiri, dimana mereka saling berebut tropi liga inggris dari tahun ke tahun. Pada wordcloud liga spanyol nama yang muncul dari pelatih ada solari dan salah satu striker terbaik barcelona yaitu messi.
Satu hal yang dapat kita pahami adalah liga inggris dan liga spanyol telah menunjukkan kelasnya di mata dunia, sehingga selalu memiliki kesan bagi para penikmat bola.