For this assignment I will be doing sentiment analysis on two complementary brands. The brands that I am choosing are the NFL and Buffalo Wild Wings.

Here is the data that I will be using.

## Rows: 200 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): full_text, source, in_reply_to_status_id, in_reply_to_user_id, in_...
## dbl  (7): id, id_str, display_text_range, in_reply_to_status_id_str, in_repl...
## lgl  (5): truncated, is_quote_status, favorited, retweeted, possibly_sensitive
## dttm (1): created_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 200 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): full_text, source, in_reply_to_status_id, in_reply_to_user_id, in_...
## dbl  (7): id, id_str, display_text_range, in_reply_to_status_id_str, in_repl...
## lgl  (5): truncated, is_quote_status, favorited, retweeted, possibly_sensitive
## dttm (1): created_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Before I do any analysis on these tweets I have to tokenize them into just being words.

## Joining, by = "word"
## Joining, by = "word"

Question One

For my first question I want to see what words are used the most within these tweets? To do this I will group by words and do a count to see which ones are used the most.

## # A tibble: 14 × 2
##    word       n
##    <chr>  <int>
##  1 10        11
##  2 2         11
##  3 3         12
##  4 catch     11
##  5 de        11
##  6 games     12
##  7 gt        14
##  8 https    115
##  9 justin    12
## 10 nfl      139
## 11 rt        97
## 12 t.co     109
## 13 team      16
## 14 week      14
## # A tibble: 28 × 2
##    word        n
##    <chr>   <int>
##  1 apply      28
##  2 bar        11
##  3 bio        11
##  4 buffalo   205
##  5 chain      19
##  6 day        14
##  7 debate     18
##  8 deposit    11
##  9 direct     11
## 10 fun        11
## # … with 18 more rows

As these tables show, most people were tweeting the score of the NFL’s while talking about the NFL. They were also talking about a guy named Justin. Based of the tokenized words we can assume that these people were talking about Justin Jefferson because many other people were tweeting about the Vikings.

Question Two

My second question is how does the sentiment between these two data sets compare to each other? To do this, I will do some sentiment analysis on both of the data sets individually, then I will compare them in my analysis.

## Joining, by = "word"
## Joining, by = "word"
## # A tibble: 96 × 3
##    word            n sentiment
##    <chr>       <int> <chr>    
##  1 audacity        1 negative 
##  2 authentic       1 positive 
##  3 awesome         1 positive 
##  4 catastrophe     1 negative 
##  5 cheated         1 negative 
##  6 consistent      1 positive 
##  7 correct         2 positive 
##  8 crazy           1 negative 
##  9 crush           1 negative 
## 10 damn            1 negative 
## # … with 86 more rows
## # A tibble: 73 × 3
##    word            n sentiment
##    <chr>       <int> <chr>    
##  1 attraction      1 positive 
##  2 avid            2 positive 
##  3 bad             2 negative 
##  4 belligerent     1 negative 
##  5 bitch           1 negative 
##  6 broke           2 negative 
##  7 clean           1 positive 
##  8 correctly       1 positive 
##  9 dead            1 negative 
## 10 disgusting      1 negative 
## # … with 63 more rows

These two tables shows us the words that appear in the bing lexicon and also tells us the sentiment behind. While looking at the NFL table, we can see that most people were using negative sentiment while tweeting about the NFL. They used words like “cheated”, “penalty”, and more

While looking at the BDUBS table, we can also see here that most people were using negative sentiments while tweeting about the restaurant. They used words like “belligerent”, “loss”, “nervous”, and more.

Question Three

My third question is how has the sentiment changed within the course of these tweets. As we can see with my previous analysis, most people were talking very negatively while tweeting about the NFL and BDUBS that night. I want to see if these tweets can capture peoples feeling either towards Buffalo Wild Wings or toward the NFL over the course of time.

## Joining, by = c("word", "sentiment")
## `summarise()` has grouped output by 'word', 'n'. You can override using the
## `.groups` argument.

As you can see by this graph, both of these data sets have more negative analysis than they do positive. I am not quite sure how to make it graph throughout the night but I would assume as the night went on, the sentiment would become more negative especially for certain groups of people. Those groups are the people whose NFL team are losing and those who regret going to BDUBS because of how their stomach feels.