For this assignment I will be doing sentiment analysis on two complementary brands. The brands that I am choosing are the NFL and Buffalo Wild Wings.
Here is the data that I will be using.
## Rows: 200 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): full_text, source, in_reply_to_status_id, in_reply_to_user_id, in_...
## dbl (7): id, id_str, display_text_range, in_reply_to_status_id_str, in_repl...
## lgl (5): truncated, is_quote_status, favorited, retweeted, possibly_sensitive
## dttm (1): created_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 200 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): full_text, source, in_reply_to_status_id, in_reply_to_user_id, in_...
## dbl (7): id, id_str, display_text_range, in_reply_to_status_id_str, in_repl...
## lgl (5): truncated, is_quote_status, favorited, retweeted, possibly_sensitive
## dttm (1): created_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Before I do any analysis on these tweets I have to tokenize them into just being words.
## Joining, by = "word"
## Joining, by = "word"
For my first question I want to see what words are used the most within these tweets? To do this I will group by words and do a count to see which ones are used the most.
## # A tibble: 14 × 2
## word n
## <chr> <int>
## 1 10 11
## 2 2 11
## 3 3 12
## 4 catch 11
## 5 de 11
## 6 games 12
## 7 gt 14
## 8 https 115
## 9 justin 12
## 10 nfl 139
## 11 rt 97
## 12 t.co 109
## 13 team 16
## 14 week 14
## # A tibble: 28 × 2
## word n
## <chr> <int>
## 1 apply 28
## 2 bar 11
## 3 bio 11
## 4 buffalo 205
## 5 chain 19
## 6 day 14
## 7 debate 18
## 8 deposit 11
## 9 direct 11
## 10 fun 11
## # … with 18 more rows
As these tables show, most people were tweeting the score of the NFL’s while talking about the NFL. They were also talking about a guy named Justin. Based of the tokenized words we can assume that these people were talking about Justin Jefferson because many other people were tweeting about the Vikings.
My second question is how does the sentiment between these two data sets compare to each other? To do this, I will do some sentiment analysis on both of the data sets individually, then I will compare them in my analysis.
## Joining, by = "word"
## Joining, by = "word"
## # A tibble: 96 × 3
## word n sentiment
## <chr> <int> <chr>
## 1 audacity 1 negative
## 2 authentic 1 positive
## 3 awesome 1 positive
## 4 catastrophe 1 negative
## 5 cheated 1 negative
## 6 consistent 1 positive
## 7 correct 2 positive
## 8 crazy 1 negative
## 9 crush 1 negative
## 10 damn 1 negative
## # … with 86 more rows
## # A tibble: 73 × 3
## word n sentiment
## <chr> <int> <chr>
## 1 attraction 1 positive
## 2 avid 2 positive
## 3 bad 2 negative
## 4 belligerent 1 negative
## 5 bitch 1 negative
## 6 broke 2 negative
## 7 clean 1 positive
## 8 correctly 1 positive
## 9 dead 1 negative
## 10 disgusting 1 negative
## # … with 63 more rows
These two tables shows us the words that appear in the bing lexicon and also tells us the sentiment behind. While looking at the NFL table, we can see that most people were using negative sentiment while tweeting about the NFL. They used words like “cheated”, “penalty”, and more
While looking at the BDUBS table, we can also see here that most people were using negative sentiments while tweeting about the restaurant. They used words like “belligerent”, “loss”, “nervous”, and more.
My third question is how has the sentiment changed within the course of these tweets. As we can see with my previous analysis, most people were talking very negatively while tweeting about the NFL and BDUBS that night. I want to see if these tweets can capture peoples feeling either towards Buffalo Wild Wings or toward the NFL over the course of time.
## Joining, by = c("word", "sentiment")
## `summarise()` has grouped output by 'word', 'n'. You can override using the
## `.groups` argument.
As you can see by this graph, both of these data sets have more negative
analysis than they do positive. I am not quite sure how to make it graph
throughout the night but I would assume as the night went on, the
sentiment would become more negative especially for certain groups of
people. Those groups are the people whose NFL team are losing and those
who regret going to BDUBS because of how their stomach feels.