library(tidyverse)
library(tidytext)
library(gutenbergr)
library(ggwordcloud)
library(textdata)
##Load in the Brewery ratings File##
brews<-read_csv("https://asayanalytics.com/brews-csv")
##Remove all of the Stop Words## This removes all of the words that you would not want to see in a sentiment analysis. Think of full words that do not have stand alone meaning.
brews_tidy<-
brews%>%
unnest_tokens(word,review_content) %>%
anti_join(stop_words)
##Summary of Data##
This data set include reviews found on Yelp on two cincinnati breweries. It includes MadTree and Rhinegeist which are large Cincy Breweries.I will use this data to analyze various aspects of the breweries.
##Top 10 words used in review##
brews_tidy %>%
group_by(word) %>%
summarize(n = n()) %>%
arrange(-n)
## # A tibble: 5,697 × 2
## word n
## <chr> <int>
## 1 beer 1018
## 2 brewery 414
## 3 space 363
## 4 beers 334
## 5 bar 313
## 6 pizza 296
## 7 love 291
## 8 food 266
## 9 time 259
## 10 rhinegeist 255
## # … with 5,687 more rows
###Graph Summary### This Command shows the top 10 most used words words in the reviews. Not suprisingly most of them have something to do with beer.
brews_tidy %>%
filter(word=="beer") %>%
group_by(word, review_rating) %>%
summarise(n=n()) %>%
arrange(-n)
## `summarise()` has grouped output by 'word'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 3
## # Groups: word [1]
## word review_rating n
## <chr> <dbl> <int>
## 1 beer 5 555
## 2 beer 4 303
## 3 beer 3 107
## 4 beer 2 28
## 5 beer 1 25
###Graph Summary### It appears that the breweries have overall good beer. Beer only has 53 reviews with less than 3 stars.
##Hypothesis Testing## Do the bartenders at one brewery have better ratings in one brewery over the other?
brews_tidy %>%
filter(word=="bartenders") %>%
group_by(word, review_rating, brewery) %>%
summarise(n=n())%>%
arrange(-n)%>%
ggplot(aes(review_rating, n))+
geom_col()+
facet_wrap(~brewery)+
xlab("Review Rating (Out of 5)")+
ylab("Number of Reviews")+
ggtitle("Bartender Ratings")
###Graph Summary### These graphs show a raw version of the bartender rating at the two breweries. It appears that Rhinegeist has lots of one star reviews using the word bartender.
I will now go and normalize this to allow for the graphs to be more fair to the number of reviews.
##Normalized Graph##
brews_tidy %>%
filter(word=="bartenders") %>%
group_by(word, review_rating, brewery) %>%
summarise(n=n())%>%
ungroup() %>%
group_by(brewery) %>%
summarize(review_rating = review_rating,
nnorm = n/sum(n)) %>%
ggplot(aes(review_rating, nnorm,fill=brewery))+
geom_col(position="dodge")+
xlab("Review Rating (Out of 5)")+
ylab("Normalized Number of Reviews")+
ggtitle("Bartender Ratings")
###Graph Summary###
I found that the MadTree brewery has much better bartenders. I was shocked to see the number of 1 start reviews that noted bartenders at Rhinegeist. I hope that the person who caused this was fired.