Using r/LivestreamFail to understand twitch streamers,chat and emotes.

1 Getting Started

Recently, twitch literature has begun characterizing twitch communities through twitch chat, viewership trends, and content but no known projects have used resources that exist outside of twitch to understand how twitch communities manifest and interact with one another.

One subreddit called LivestreamFail (LSF) is a dedicated subreddit where users share these twitch clips, general twitch news, and twitch drama. LivestreamFail is one way smaller streamers become noticed and is a platform that I can use to compare big and small communities. I’m interested in the ways emotes are used between smaller and larger communities because I believe emote meanings and sentiments are being actively redefined. This analysis will be split into an LSF part and Twitch emote-sentiment part.

This analysis will investigate users, posts, and comments (sentiment and topic) on r/LivestreamFail and then investigate the comment data and emote use from twitch clips that were featured in LSF posts.

Libraries used

tidyverse
tidytext
tm
lubridate
stringr
text2vec
jsonlite
widyr

quanteda
visNetwork
igraph
ggraph
DT
ggthemes
extrafont
BTM

2 r/LivestreamFail

2.1 LSF Data Collection

Using Python, R and Bigquery

Python
- used this to interact with PRAW (The Python Reddit API) that grabbed ‘current’ posts.
- This data is used in the emote(companion) analysis
R
- Used to interact with BigqueryR to pull data from the Pushshift reddit dataset which contained historical data from ’16.
- Collected Reddit posts and Reddit Comments

This resulted in over 39k reddit posts and 1.4 M reddit comments. 900 posts were then used in R scrape links and document if the clips had chat available for download.

## Rows: 39,696
## Columns: 10
## $ title     <chr> "PoE streamer gets scared b...
## $ score     <dbl> 49, 62, 74, 402, 43, 631, 4...
## $ url       <chr> "https://clips.twitch.tv/zi...
## $ date      <date> 2016-10-23, 2016-10-01, 20...
## $ domain    <chr> "clips.twitch.tv", "oddshot...
## $ author    <chr> "spoonducky", "Chandman68",...
## $ permalink <chr> "/r/LivestreamFail/comments...
## $ titlev1   <chr> "PoE streamer  scared    no...
## $ titlev2   <chr> "PoE streamer  scared    no...
## $ titlev3   <chr> "PoE streamer scared notifi...

lsf_data_comments = read_csv("https://media.githubusercontent.com/media/mowgl-i/Reddit-and-Twitch/master/Data%20Collection/lsf_data_comments.csv")


lsf_data_comments$date<-as.POSIXct(lsf_data_comments$date,origin = "1970-01-01")

glimpse(lsf_data_comments[2:1416070,])

## Rows: 1,416,069
## Columns: 8
## $ body      <chr> "He probably will be once his mom sees her charge history...
## $ name      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ link_id   <chr> "t3_4jdlmz", "t3_4ltvda", "t3_4jv5uz", "t3_4keb97", "t3_4...
## $ date      <dttm> 2016-05-14 22:51:40, 2016-05-31 04:09:14, 2016-05-18 10:...
## $ author    <chr> "aleco247", "4LTRU15T1CD3M1G0D", "PM_ME_UR_WAIFU_NUDES", ...
## $ parent_id <chr> "t1_d35wc5v", "t1_d3q4mot", "t1_d3a173k", "t3_4keb97", "t...
## $ score     <dbl> 111, 122, 71, 67, 110, 93, 51, 64, 71, 107, 112, 93, 220,...
## $ id        <chr> "d35x91t", "d3q5t7v", "d3a8hjs", "d3edmln", "d2qkd00", "d...

# I think we can link the posts to the comments using the posts perma link and the link id for the comments
# by removing the t*_ before the numbers on link_id

lsf_data$permalink[1]

## [1] "/r/LivestreamFail/comments/58y5c3/poe_streamer_gets_scared_by_his_own_notification/"

lsf_data_comments$link_id[1]

## [1] "t3_4hha9f"

rm(lsf_data_comments)

2.2 Descriptions of lsf posts

This is a plot of “Karma” which can be thought of as likes over 3 years on the subreddit. As we can see, there is a growing interest in this subreddit since it’s received more than 4 times the up votes in 2019 than in 2016.

lsf_data %>% mutate(year =  as.factor(format(date, "%Y"))) %>% group_by(year) %>% summarize(sum(score)) %>% arrange(year)%>%ungroup()%>% ggplot(aes(x=year, y = `sum(score)`, group = 1))+geom_line()+labs(x = "Year", y ="Sum Score" , title ="Sum of Karma by Year",subtitle = "looks like we have increasing interest in LSF over the years") + theme_wsj(base_size = 9,base_family = "Segoe Print",color = "green") + theme(plot.subtitle = element_text(size = 10))

But does this pattern hold true for every year?

lsf_data %>% mutate(month =  as.factor(format(date, "%m")),
                    year = as.factor(year(date))) %>%
  group_by(year,month) %>% summarize(sum(score)) %>% arrange(month)%>%ungroup()%>% ggplot(aes(x=month, y = `sum(score)`, group = 1))+geom_line()+
  labs(x = "Month" , y = "Sum of Posts" , title = "Sum of Karma/Points by month")+ theme_wsj(base_size = 9,base_family = "Segoe Print",color = "green")+
  facet_wrap(vars(year))

Yes, there is a pattern at the monthly level! there are 2 clear peaks and dips. This information may be useful for streamers or community members to ‘farm’ clips or strategically post on the reddit to maximize the opportunity of karma or views.

This below will show the number of duplicate links that are posted on LSF.

lsf_data %>% group_by(url) %>% count(sort = T)%>% filter(n > 2) %>% ggplot(aes(x = url, y =n, fill = url)) + geom_col()+labs(x = "Link",y ="Number of Posts/Reposts",title ="Number of URL Reposts",subtitle = "Top 10 Links")+ theme_wsj(base_size = 9,base_family = "Segoe Print",color = "green")+theme(axis.text.x = element_blank(), legend.direction = "vertical", legend.title = element_blank(),axis.line.x = element_blank(),axis.ticks.x = element_blank()  ) + guides(fill = guide_legend(ncol = 3))

which links are being posted on LSF overall?

library(lubridate)
lsf_data %>% group_by(domain, year(date)) %>% count() %>% filter(n>50)  %>% ggplot(aes(x = `year(date)`, y = n , color = domain ))+
  geom_line(size = 3, alpha = 0.5)+ theme_wsj(base_size = 9,base_family = "Segoe Print",color = "green")+ theme(legend.title = element_blank(),plot.subtitle = element_text(size = 10))+ labs(title = "Post Links over time.", subtitle = "Filtered for links occuring more than 50 times")

This next visualization helps me understand which types of media are featured on the subreddit. This is great because we can see that a lot of these posts are Twitch related other than the Etika post, who is a famous and beloved you-tuber. Also, the Keanu reeves moment during his presentation at a gaming convention.

library(ggrepel)
lsf_data %>% top_n(10,wt=score) %>% filter(score < 60000) %>% ggplot(aes(x = date, y = score, label = title, color = as.factor(score)))+
  geom_point(show.legend = F)+
  geom_label_repel(size = 3, vjust ="inward", hjust ='inward',show.legend = F, nudge_y = 10)+
  theme_wsj(base_size = 10,color = "green",base_family = "Segoe Print")+
  labs(title  = "Top 10 upvoted posts by date",caption = "Twitch, youtube, games and memes")+ theme(plot.caption = element_text(size = 10))

2.2.1 User posts

2.2.1.1 Number of posts

Normiesree has a suspicious number of posts! Could they be reposting? Are they legit posts? perhaps he’s the oldest surviving member of “top posters” on lsf, since many users were [deleted].

lsf_data %>% group_by(author) %>% filter(author != "[deleted]") %>% count(sort =T) %>% ungroup() %>% top_n(10) %>% ggplot(aes(x = reorder(author,n), y =n)) + geom_col()+
coord_flip()+labs(x = "Author",y ="Number of Posts",title ="Which users post the most?",subtitle = "Top 10 users")+ theme_wsj(base_size = 9,base_family = "Segoe Print",color = "green")

2.2.1.2 Top Karma’d users

lsf_data %>% group_by(author) %>% filter(author != "[deleted]") %>% summarize(sum(score)) %>%top_n(10) %>% ggplot(aes(x = reorder(author,`sum(score)`), y =`sum(score)`))+
  geom_col()+
  coord_flip()+
  labs(y ="Total Score",x = "User",title = "Top upvoted redditors",subtitle = "Top 10 users")+
  theme_wsj(base_size = 9,base_family = "Segoe Print",color = "green")

Comparing reddit karma for popular users over time.

lsf_data %>%mutate(month =  as.factor(format(date, "%m")),
                    year = as.factor(year(date))) %>% group_by(author,month, year) %>% 
  filter(author != "[deleted]") %>% 
  summarize("total score" = sum(score)) %>%
  ungroup() %>%
  filter(author %in% c("-eDgAR-","Normiesreeee69","bbranddon","moody420","TTVRaptor")) %>%
  filter(year != c("2016","2017")) %>% 
  ggplot(aes(x = month, y =`total score`, color = author, group = author))+
  geom_line(size = 3, alpha = 0.6)+
  #labs(y ="Total Score",x = "User",title = "Top upvoted redditors",subtitle = "Top 10 users")+
  theme_wsj(base_size = 9,base_family = "Segoe Print",color = "green")+
  facet_wrap(vars(year), strip.position = "bottom")+
   theme(legend.title = element_blank(),legend.position = "bottom")+
  labs(title = "Reddit karma over time for popular users")

# This creates a !%in% kind of deal
`%notin%` <- Negate(`%in%`)


lsf_data %>% 
  filter(str_detect(titlev3, regex("xqc|xqcow|train|trainwrecks|trainwrecks|trainwreckstv|forsen|esfand|esfandtv|ludwig|mizkif", ignore_case = T))) %>% 
  mutate("streamer" = case_when(
    str_detect(titlev3, regex("xqc|xqcow",ignore_case = T)) ~ "xQcOw",
    str_detect(titlev3, regex("train|trainwrecks|trainwrecks|trainwreckstv",ignore_case = T)) ~ "TrainwrecksTv",
    str_detect(titlev3, regex("forsen",ignore_case = T)) ~ "forsen",
    str_detect(titlev3, regex("esfand|esfandtv",ignore_case = T)) ~ "EsfandTv",
    str_detect(titlev3, regex("ludwig",ignore_case = T)) ~ "Ludwig",
    str_detect(titlev3, regex("mizkif",ignore_case = T)) ~ "Mizkif")) %>% 
  group_by(streamer, ) %>% mutate(month =  as.factor(format(date, "%m")),
                    year = as.factor(year(date))) %>% 
   group_by(year,month,streamer) %>% summarize("total score" = sum(score)) %>%  ungroup() %>%
   filter(year %notin% c("2016","2017")) %>% 
  ggplot(aes(x=month, y = `total score`, group = streamer,color = streamer))+geom_line(size = 2,alpha = 0.7)+
  facet_wrap(vars(year),nrow = 1, ncol = 4,strip.position = "bottom") + theme_wsj(base_size = 9,base_family = "Segoe Print",color = "green")+ theme(legend.title = element_blank(),legend.position = "bottom")+
  labs(title = "Reddit karma over time", subtitle = "for posts containing 'relevant' streamers")

2.3 LSF Comments: Biterm Topic modeling for short texts.

not currently working :( Leaving the code here for review :)

rm(lsf_data,post_tokens)

lsf_data_comments = read_csv("https://media.githubusercontent.com/media/mowgl-i/Reddit-and-Twitch/master/Data%20Collection/lsf_data_comments.csv")


lsf_data_comments$date<-as.POSIXct(lsf_data_comments$date,origin = "1970-01-01")

lsf_data_comments %>%  glimpse()

lsf_data_comments$body <- lsf_data_comments$body %>%
  removeWords(stopwords("smart")) %>% 
  removePunctuation() %>%
  removeNumbers() %>% 
  stripWhitespace() %>% 
  tolower()

lsf_data_comments <-lsf_data_comments %>%
  mutate(id = row_number()) %>% 
  select(id,body)

set.seed(187187)

model <- BTM(lsf_data_comments, k = 10, window = 10, trace = T,iter = 20)

terms(model, top_n = 10)

textplot::textplot_bitermclusters(model)

3 Twitch

dr.k clip

A glipmse at the data.

data <- read_csv("https://media.githubusercontent.com/media/mowgl-i/Reddit-and-Twitch/master/Data%20Collection/chats.csv",col_types = cols(X1 = col_skip()))

head(data, n = 10)

## # A tibble: 10 x 4
##    body          user           date                streamer
##    <chr>         <chr>          <dttm>              <chr>   
##  1 +2            blakeleigh321  2017-01-04 23:24:33 Jerma985
##  2 OMEGALUL      Humorous_Chimp 2013-03-29 16:35:36 Jerma985
##  3 OUT OF MOJO   mayodongs      2019-10-02 15:48:12 Jerma985
##  4 DOOR OMEGALUL mprQQ          2016-01-05 14:07:25 Jerma985
##  5 OMEGALUL Clap zRaayan        2014-04-05 09:57:48 Jerma985
##  6 SPIDERMAN     gum100         2011-03-11 18:28:18 Jerma985
##  7 +2            craigwhoisme   2019-12-17 21:08:26 Jerma985
##  8 OMEGALUL      feralbutch     2019-01-30 19:27:07 Jerma985
##  9 LUL           Angle_Dez      2014-07-23 23:43:32 Jerma985
## 10 OMEGALUL      Pretendeer     2012-11-24 17:01:36 Jerma985

glimpse(data)

## Rows: 46,319
## Columns: 4
## $ body     <chr> "+2", "OMEGALUL", "OUT OF MOJO", "DOOR OMEGALUL", "OMEGALU...
## $ user     <chr> "blakeleigh321", "Humorous_Chimp", "mayodongs", "mprQQ", "...
## $ date     <dttm> 2017-01-04 23:24:33, 2013-03-29 16:35:36, 2019-10-02 15:4...
## $ streamer <chr> "Jerma985", "Jerma985", "Jerma985", "Jerma985", "Jerma985"...

3.1 Twitch: Data collection

tl;dr - twich dmca issues, streamer bans, and the app I used prevented me from downloading alot more data. Next time, I’ll work with the twitch api directly.

To actually download the twitch chat, I used links from the LSF analysis and the application by lay295/zigagrcar on github found here.

The Digital Millennium Copyright Act is affecting twitch in a big way.

Twitch is currently in hot water with DMCA claims, and they are banning streamers for repeated streaming “copyrighted” songs. One method streamers use to combat this is by deleting their content shortly after it was broadcasted. This affected data collection since the collected twitch clips were being actively taken down.

This led to the collection of twitch chat from 227 links present in from the reddit posts.

3.2 Emote Data

R and Rselenium was used to scrape the emote data from FrankerFaceZ and BettertwitchTV. Roughly the Top 300 emotes used from each site was collected (emote name and link to image).

This is what a “busy” chat may look like.

3.2.1 Emote data table

This is a data table with the names and images of the top 600 emotes from both sites (BTTV & FFZ) combined.

test<-emote_data %>% mutate("emote_image_link" = emote_link) %>% 
  mutate(emote_image_link = case_when(
  is.na(emote_image_link) ~ emote_image,
  TRUE ~ emote_image_link)) %>% mutate(emote_image_link = paste("<img src=", emote_image_link, sep = "")) %>% 
  mutate(emote_image_link = paste0(emote_image_link,' height="52"></img>',sep = "")) %>%  select(emote_name,emote_image_link)

datatable(test, escape = FALSE,caption = "Emote names and Photos")

3.3 Descriptions of twitch chat

This chart shows us how many unique chat lines there are per streamer. This metric is useful for understanding which streamers may be getting the most attention during a point in time on LSF (October in this case). This metric should later be controlled for clip length, since longer clips offer more opportunity for chat engagement.

data %>% group_by(streamer) %>%  count(sort = T)%>%
  head(n=10) %>% 
  ggplot(aes(x = reorder(streamer,-n), y = n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(size = 12, angle = 15,vjust = .55))+
  labs(title = "Which streamer has the most chats?")

This visualization shows us the most active twitch chatters in our dataset. In a larger dataset, finding those high-interaction chatters maybe useful for drawing links between communities or even creating a contributor badges on twitch (like the founders badge).

data %>% group_by(user) %>% filter(user %notin% c("StreamElements","Streamlabs","Nightbot")) %>%  count(sort = T) %>%
  head(n=10)%>% 
  ggplot(aes(x = reorder(user,-n), y = n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(size = 8, angle = 15,vjust = .55),
        plot.title = element_text(size = 20))+
  labs(title = "Which user has the most chats?")

# Streamelements and streamlabs are bots.

This plot will give us further insight in the demographics of the communities of top 5 streamers. This shows the number accounts created by year for each member of the chat by streamer. As an example, one conclusion that may be drawn is that streamers forsen and Mizkif are not attracting new accounts (New user/ban evaders) to their channels. Another conclusion that may be drawn is that Trainwreckstv in 2018, attracted a lot of new users, and perhaps played a significant role in bringing new users to twitch. I should investigate further to understand what happened with train in 2018. This was perhaps his drama year with MitchJones (A popular WOW streamer) or The Speech.

top_5_streamers <- data %>% group_by(streamer) %>% count(sort = T) %>% head(n=5) %>% distinct(streamer)


data %>% filter(streamer %in% top_5_streamers$streamer) %>% mutate(date_year = year(as.Date.character(date))) %>% group_by(date_year,streamer) %>% count(sort = T)%>%
  ggplot(aes(x = date_year, y = n, color = streamer)) + 
  geom_line(size = 2)+
  theme_wsj(base_size = 12, color = "green")+
  labs(title = "Streamer Communities: Account Creation Dates", subtitle = "Top 5 Streamers")+
  theme(plot.title = element_text(size = 15),plot.subtitle = element_text(size= 8),legend.title = element_blank(),legend.position = "bottom")

This next visualization will show the total user chat broken down by streamer for users who were present in more than 4 different chats. I’ve already filtered out the popular chat bots, but this may be a means of learning about new chat bots that exists, or to uncover community moderators (also present in badges).

data %>% 
  add_count(user,streamer) %>% 
  filter(n>1,user %notin% c("StreamElements","Streamlabs","Nightbot","Fossabot")) %>% 
  group_by(user) %>%
  add_count(n_distinct(streamer)) %>% 
  ungroup() %>% 
  top_n(`n_distinct(streamer)`, n = 2) %>% 
  arrange(desc(n)) %>%
  filter(n >=20) %>% 
  add_count(user,streamer,name = "#_chat_per_clip") %>% 
  ggplot(aes(x = user, y = `#_chat_per_clip`, fill = streamer))+
  geom_bar(position = "stack",stat = "identity") + 
  theme_wsj(base_size = 12, color = "green")+
  theme(legend.title = element_blank(),legend.position = "bottom")+
  labs(title = "multi-stream chatters")

3.3.1 Bag of words

Tokens, bigrams and trigrams can give us insight into popular emotes/words combinations and spams that occur in these chats.

3.3.1.1 Tokens

tokens %>% group_by(word) %>% count(sort = T)%>%
  head(n=10) %>% 
  ggplot(aes(x= reorder(word,-n),y=n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(angle = 25))+
  labs(title ="Token Counts")

3.3.1.2 Bi-grams

data %>%
  unnest_tokens(bigram,body,token = 'ngrams',n = 2)%>% 
  filter(str_detect(bigram,"^[:alpha:]")) %>% 
  group_by(bigram) %>% count(sort = T)%>%
  head(n=10) %>% 
  ggplot(aes(x= reorder(bigram,-n),y=n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(angle = 25,size = 9))+
  labs(title ="Bigram Counts")

3.3.1.3 Tri-grams

data %>%
  unnest_tokens(trigram,body,token = 'ngrams',n = 3)%>% 
  filter(str_detect(trigram,"^[:alpha:]")) %>% 
  group_by(trigram) %>% count(sort = T)%>%
  head(n=10) %>% 
  ggplot(aes(x= reorder(trigram,-n),y=n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(angle = 25,size = 9))+
  labs(title ="trigram Counts")

As you can see, there are a lot of duplicated words/phrases. I like to think of these as spams.

Some popular ones are “I was here Pogu I was here Pogu…” or “OMEGALUL OMEGALUL OMEGALUL OMEGALUL..”. One way to combat this is by stripping the chat text to only it’s unique words.

3.4 Comparisons of twitch chat

3.4.1 Word/Emote correlations

corpus_data <- readtext("https://media.githubusercontent.com/media/mowgl-i/Reddit-and-Twitch/master/Data%20Collection/corpus_data.csv", text_field = 'text')
glimpse(corpus_data)

## Rows: 35,303
## Columns: 3
## $ doc_id   <chr> "corpus_data.csv.1", "corpus_data.csv.2", "corpus_data.csv...
## $ text     <chr> "+2", "OMEGALUL", "OUT OF MOJO", "DOOR OMEGALUL", "OMEGALU...
## $ streamer <chr> "Jerma985", "Jerma985", "Jerma985", "Jerma985", "Jerma985"...

corpus <- corpus(corpus_data)


dfm <- dfm(corpus, remove_punct=T)

# Select emotes
emotes = emote_data$emote_name

tags = dfm_select(dfm,pattern = emotes)

#tags

toptag = names(topfeatures(tags,30))

# These are the top emotes mentioned in the dataset from the list of popular bttv/ffz emotes
head(toptag)

## [1] "omegalul"  "pogu"      "lulw"      "kekw"      "pepelaugh" "clap"

3.4.2 Text Plot Networks

3.4.2.1 Emote Combinations

tag_fcm <- fcm(tags)

toptags_fcm <- fcm_select(tag_fcm, pattern = toptag)
textplot_network(toptags_fcm,min_freeq = 0.1, edge_alpha = 0.5, edge_size = 5)

3.4.2.2 Emote/word combinations

tags = dfm_select(dfm, pattern = c("£","â","<","ó","ðÿ"),selection = "remove")

#tags

toptag = names(topfeatures(tags,20))


tag_fcm <- fcm(tags)

toptags_fcm <- fcm_select(tag_fcm, pattern = toptag)
textplot_network(toptags_fcm,min_freeq = 0.1, edge_alpha = 0.5, edge_size = 5)

3.4.3 Directed Bi-gram network

The Previous plots don’t show us the direction of the combinations. This is useful to understand the order of emotes/spams. Otherwise, one may think that Clap EZ is an acceptable spam.

count_bigrams <- function(data) {
  data %>%
    unnest_tokens(bigram,"body", token = "ngrams", n = 2) %>%
    separate(bigram, c("word1", "word2"), sep = " ") %>%
    count(word1, word2, sort = TRUE)
}

visualize_bigrams <- function(bigrams) {
  set.seed(2020)
  a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

  bigrams %>%
    graph_from_data_frame() %>%
    ggraph(layout = "fr") +
    geom_edge_link(aes(edge_alpha = n), show.legend = FALSE, arrow = a) +
    geom_node_point(color = "lightblue", size = 5) +
    geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
    theme_void()
}

viz.bigrams <- data %>%
  count_bigrams()

# filter out rare combinations, as well as digits and produce graph
viz.bigrams %>%
  filter(n >70) %>%
  visualize_bigrams()

3.4.4 Streamer emote comparisons

With this visualization, I aimed to uncover the “shared” emotes between streamers/communities. For example, we can see that the streamer EsfandTv shares a lot of emotes with different streamers and may share many community members with TrainwrecksTv.

word_cors <- tokens %>%
  group_by(word) %>%
  filter(n() >= 10 ) %>%
  pairwise_cor(word, streamer, sort = T)#, sort = TRUE)

top_10 <-word_cors %>% mutate("streamer" = case_when(
  item2 == "trainwreckstv" ~ "trainwreckstv", # ahh, 
  item2 == "esfandtv" ~ "esfandtv",
  item2 == "forsen" ~ "forsen",
  item2 == "mizkif" ~ "mizkif",
  item2 == "ludwig" ~ "ludwig",
  item2 == "moonmoon" ~ "moonmoon",
  item2 == "xqcow" ~ "xqcow",
  item2 == "sykkuno" ~ "sykkuno",
  item2 == "vadikus007" ~ "vadikus007",
  item2 == "loltyler1" ~ "loltyler1",
  TRUE ~ "WHO?"
)) # there are 83 unique streamers in the dataset, we should filter this some how. Either top 20, or maybe with chats > 100. 
# Build a scraper that grabes the names of emotes for eache of the streamers?
#top_10 %>% group_by(streamer) %>% count() # strange numbers here, each streamer has same number?, because I filteer for top 10 above?
 # from 2 mil rows
 # to about 2k rows
test<-top_10 %>%
  mutate(contains_emote = case_when(item1 %in% emote_data$emote_name ~ 1, TRUE ~ 0)) %>% 
  filter(contains_emote == 1) %>% # filtering for only emotes!
  filter(streamer != "WHO?")%>%
  group_by(streamer) %>% top_n(10,wt = correlation)

streamers = c("trainwreckstv","esfandtv","forsen","mizkif","ludwig","moonmoon","xqcow","sykkuno","vadikus007","loltyler1")

emote_data_1 <- emote_data %>% mutate("emote_image_link" = emote_link) %>% 
  mutate(emote_image_link = case_when(
  is.na(emote_image_link) ~ emote_image,
  TRUE ~ emote_image_link)) %>%  select(emote_name,emote_image_link)


# TEST 2
test_2 <- test %>% left_join(emote_data_1, by = c("item1" = "emote_name"))
#-------


test_2 <- test_2 %>% graph_from_data_frame()
test_viz_2 <- toVisNetworkData(test_2)
test_viz_2$nodes <- test_viz_2$nodes %>% mutate("group" = case_when(label %in% streamers ~ "Streamer",TRUE ~ "Emote"),
                                                "shape" = "image")

test_viz_2$nodes<-test_viz_2$nodes %>% left_join(test_viz_2$edges, by = c('id' = 'from')) %>% select(id,label,group,shape,emote_image_link) %>% rename(image = emote_image_link) %>% distinct(id, .keep_all = T) 


test_viz_2$nodes<-test_viz_2$nodes %>% mutate(image = case_when(id == "xqcow" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/xqcow-profile_image-9298dca608632101-70x70.jpeg",
                                              id == "loltyler1" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/f3591dbe4ee3d94b-profile_image-70x70.png",
                                              id == "moonmoon" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/3973e918fe7cc8c8-profile_image-70x70.png",
                                              id == "trainwreckstv" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/1f47965f-7961-4b64-ad6f-71808d7d7fe9-profile_image-70x70.png",
                                              id == "forsen" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/forsen-profile_image-48b43e1e4f54b5c8-300x300.png",
                                              id == "esfandtv" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/476ee93d-66a6-4e57-b3a9-db1ceb168ad8-profile_image-70x70.png",
                                              id == "mizkif" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/ddd88d33-6c4f-424f-9246-5f4978c93148-profile_image-70x70.png",
                                              id == "ludwig" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/bde8aaf5-35d4-4503-9797-842401da900f-profile_image-70x70.png",
                                              id == "sykkuno" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/sykkuno-profile_image-6ab1e70e07e29e9b-70x70.jpeg",
                                              TRUE ~ as.character(image))) %>%  na.omit()


#test_viz_2 checking the dataframe
visNetwork(nodes = test_viz_2$nodes, edges = test_viz_2$edges, main = "Emote correlation to Streamer")%>%
  visGroups(groupname = "Streamer", color = "green", shape = "square", size = 50) %>%
  visGroups(groupname = "Emote", color = "blue")%>%
  visOptions(highlightNearest = list(enabled = T, hover = T))%>%
  visLegend()

#test_viz_2$nodes

4 Limitations

I am aware that chat will spam repeated words and emotes, and I considered reducing the chat messages to their unique word in a given chat. This method was used in another paper about twitch chat, but I didn’t implement that feature here. Also, when tracking streamer related posts, further examination of the post titles is needed. For example, a post title maybe flagged as “XQC” related but really, the clip features another streamer reacting to an action or statement XQC said.. for example: Sweet_Anita reacts to XQC’s fail when playing Fortnite. Finally, this project was intended highlight small and large communities alike. While working on the project, it was clear that I didn’t know quite enough about twitch chat and the nuance of emote meanings to uncover the subtle changes between small and large communities.

So what’s next?

hmm

I intended to do a emote sentiment analysis for these unique emotes, but I didn’t make it that far. I also believe that this data set is not big enough to derive the sentiment for the emotes. Moving forward I will aim to “automate” the data collection, data cleaning, data validation and chat download from LSF and Twitch. The automation will make it so more data is collected with minimal risk of vod/clip deletions. Also, I would like to explore community badges that exists in the dataset (Mods, subcribers and VIPs) for interesting patterns. Finally, a large goal of mine is to incorporate the sentiment analysis of chat (emotes) and the other visualizations from this report into a dashboard that can be updated live or after every broadcast.

clap

Final Report

Michael Puerto

11/27/2020