KLove’s Unstructured Project

Hey there! Welcome to my project for my unstructured data class. The focus of the course was to work with text and pictures to do some cool analysis. The program I use is called R, which is a language for data analysis.

My project is going to scrape some reviews from yelp for some of the “best” bars in the Jersey Shore. I hope you watched the TV show because that’s what I’m going for.

My first step, which you can’t see, is me loading some packages into R that will help me import, clean, and analyze text

And with that…

Mike “The Situation” Sorrentino, what a guy

Let’s start with Bamboo, from the actual TV show

Home of the “Jersey Turnpike”

Bamboo = "https://www.yelp.com/biz/the-bamboo-bar-seaside-heights"

BambooHTML = read_html(Bamboo)

bbRatings = BambooHTML %>% 
  html_nodes(".review-wrapper .review-content .i-stars") %>% 
  html_attr("title") %>% 
  stringr::str_extract("[0-5]")

bbReviews = BambooHTML %>% 
  html_nodes(".review-wrapper .review-content p") %>% 
  html_text()

BambooData = data.frame(ratings = bbRatings, 
                           reviews = bbReviews,
                           restaurant = "The Bamboo Bar", 
                           stringsAsFactors = FALSE)

DJais, one of our classmate’s favorite spots in Belmar NJ

The most annoying people I went to middleschool with love this place too

DJais = "https://www.yelp.com/biz/d-jais-oceanview-bar-and-cafe-belmar"

DJaisHTML = read_html(DJais)

djRatings = DJaisHTML %>% 
  html_nodes(".review-wrapper .review-content .i-stars") %>% 
  html_attr("title") %>% 
  stringr::str_extract("[0-5]")

djReviews = DJaisHTML %>% 
  html_nodes(".review-wrapper .review-content p") %>% 
  html_text()

DJaisData = data.frame(ratings = djRatings, 
                    reviews = djReviews, 
                    restaurant = "DJais", 
                    stringsAsFactors = FALSE)

Bar Anticipation, home of the legendary Beat-the-Clock special that turns Tuesdays into unwanted college reunions

BarA = "https://www.yelp.com/biz/bar-anticipation-lake-como?osq=bar+anticipation"

BarAHTML = read_html(BarA)

BarARatings = BarAHTML %>% 
  html_nodes(".review-wrapper .review-content .i-stars") %>% 
  html_attr("title") %>% 
  stringr::str_extract("[0-5]")

BarAReviews = BarAHTML %>% 
  html_nodes(".review-wrapper .review-content p") %>% 
  html_text()

BarAData = data.frame(ratings = BarARatings, 
                    reviews = BarAReviews, 
                    restaurant = "Bar A", 
                    stringsAsFactors = FALSE)

Johnny Mac’s, the best bar I’ve ever been to because you get free pizza with every drink and the pizza is better than anything in the midwest

JohnnyMacs = "https://www.yelp.com/biz/johnny-mac-house-of-spirits-asbury-park"

JMacsHTML = read_html(JohnnyMacs)

jmRatings = JMacsHTML %>% 
  html_nodes(".review-wrapper .review-content .i-stars") %>% 
  html_attr("title") %>% 
  stringr::str_extract("[0-5]")

jmReviews = JMacsHTML %>% 
  html_nodes(".review-wrapper .review-content p") %>% 
  html_text()

jmData = data.frame(ratings = jmRatings, 
                    reviews = jmReviews, 
                    restaurant = "Johnny Mac's", 
                    stringsAsFactors = FALSE)

Combining all the seperate reviews and ratings for each bar into one dataframe

allReviews = dplyr::bind_rows(BambooData, DJaisData, BarAData, jmData ) %>% 
  dplyr::mutate(ratings = as.numeric(ratings), 
                wordCount = stringr::str_count(reviews, pattern = "\\S+"))

Let’s do a word cloud to see some of the most frequent terms

allReviews %>%
  unnest_tokens(output = word, input = reviews) %>% 
  anti_join(stop_words) %>% 
  count(word, sort = TRUE) %>% 
  filter(n > 5) %>% 
  na.omit() %>% 
  wordcloud2(shape = "cardioid")

## Joining, by = "word"

I also tried this for DJais, but it wasn’t family friendly

Let’s move on to more analytical sentiment analysis

allReviews = allReviews %>% 
  mutate(reviewID = 1:nrow(.)) # Just adding a reviewer id to the data

reviewSentiment = sentiment(get_sentences(allReviews$reviews), 
                            polarity_dt = hash_sentiment_jockers)

reviewSentiment = reviewSentiment %>% 
  group_by(element_id) %>% 
  summarize(meanSentiment = mean(sentiment))

allReviews = left_join(allReviews, reviewSentiment, by = c("reviewID" = "element_id"))

So everything above this was just some cleaning and prepping, below is some actual useful stuff

For actual insight to what was done above, I looked at the actual text of the reviews and compared the words to a preset library to get a feeling for whether each word was positive or negative

allReviews %>% 
  group_by(restaurant) %>% 
  summarize(meanRating = mean(ratings), 
            meanSentiment = mean(meanSentiment))

## # A tibble: 4 x 3
##   restaurant     meanRating meanSentiment
##   <chr>               <dbl>         <dbl>
## 1 Bar A                1.85       -0.0107
## 2 DJais                2.5         0.0431
## 3 Johnny Mac's         2.95        0.116 
## 4 The Bamboo Bar       2.35        0.0152

Looking at the rating and sentiment, there is a bit of a pattern between the sentiment of the reviews and the actual ratings, obviously we want higher values for both

I thought there would be more of a gap between Johnny Mac’s and everyone else but I guess my taste sucks too

Anyhow let’s make things visual, this first trick being a nice table of the reviews with word counts, ratings, and sentiment

If the table doesn’t load don’t worry about it, it wasn’t that cool anyway

sentimentBreaks = c(-.5, 0, .5)

breakColors = c('rgb(178,24,43)', 'rgb(239,138,98)', 'rgb(103,169,207)', 'rgb(33,102,172)')

datatable(allReviews, rownames = FALSE) %>% 
  formatStyle("reviews", "meanSentiment", backgroundColor = styleInterval(sentimentBreaks, breakColors))

This next graph just plots sentiment vs ratings, which we expect to be related as we saw before

ggplot(allReviews, aes(ratings, meanSentiment, color = restaurant)) +
  geom_point() +
  theme_minimal()

This second one looks at sentiment vs word count

The lines here look at the trend (so the more people write the less they liked JMac’s)

If you look really close you can see some person wrote the bible for Bar A in a review

ggplot(allReviews, aes(wordCount, meanSentiment, color = restaurant)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

For shiggles let’s see what was going on in that long review

I apologize for the massive amount of whitespace

allReviews[which.max(allReviews$wordCount),]

##   ratings
## 9       1
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     reviews
## 9 had the worst experience ever here. Such terrible service and it's not even a great place. First of all, I am 21 years of age and they wouldn't let me in cause I didn't have a drivers license (I don't like to drive it gives me anxiety and I have recurring seizures). My workers ID and credit cards WITH my pictures and name on them apparently were not good enough either. Second of all my friend who went inside who is 20 got kicked out. She paid $15 cover charge to get in (which doesn't make sense at all) got kicked out because they stuck her wrist band on to her skin instead of the actual paper and when she washed her hands and when it got wet it was sticking to her and irritating her skin (plus she was also sweating so it was giving her a rash) so she took it off and put it in her pocket. Because of this she got FORCIBLY removed from the club as if she was drinking or doing something illegal and terrible. The bouncers and the manager were UNFRIENDLY and do not have manners with people at all. I will not be attending again and I'll make sure no one I know does either. The only positive experience was that Gary the bouncer on the other side of bamboo (the bar side) was extremely nice and helped us get an uber because oh yea they kicked us out IN THE RAIN and didn't give us an option to get an uber and then leave. Horrible place DO NOT GO!!!! Also, they had so many people in there using fake IDs and drinking alcohol yet somehow we were attacked. My friends witnessed the manager send the bouncer over to specifically target us. AND the woman who was holding the IDS (which is a stupid policy) told us "he wanted us out from the start and didn't like us because we questioned why he kept the IDs". seems like that's a little childish for a manager. Again will not be returning, except to spit in the managers face.
##       restaurant wordCount reviewID meanSentiment
## 9 The Bamboo Bar       355        9    -0.1158715

This person needs a hobby besides yelping

Thanks for reading!

KLove

KLove’s Unstructured Project

Kevin KLove DeMaio

February 25, 2019