The purpose of this document is to analyze Dana Garden’s Yelp webpage. Our data had been collected through HTML webscraping . Self-guidied analysis will be motivated in my interest to see how others have rated Dana’s and learn more information about Dana’s reviews.
The document is structured with the following sections:
The packages required for this markdown are:
| Package | Summary |
|---|---|
| tidyverse | The tidyverse collection of packages |
| dplyr | Used for data cleaning & manipulation |
| ggplot2 | Necessary for creating visuals used throught this exam |
| scales | Necessary for editting labels on visuals |
| knitr | Used for RMarkdown documents |
| rmdformats | Used for RMarkdown themes |
| pander | Creates summary tables for Markdown |
| DT | Javascript enabled data tables |
| stargazer | Creating nice regression tables |
| PerformanceAnalytics | Detailed plots and tables for analytics |
While we might have our own way of ranking a restaurant or bar from 1-5 stars, through my web scraping, I learned that Yelp has their own of definition for each star rating. Check out what I found:
| STAR RATING | YELP DEFINITION |
|---|---|
| 1 | “Eek! Me Thinks Not.” |
| 2 | “Meh. I’ve Experienced Better.” |
| 3 | “A-Ok” |
| 4 | “Yay! I’m a Fan.” |
| 5 | “Woohoo! As goood as it gets!” |
It is a silly little finding, but it is interesting to see how Yelp links phrases to interpret a star rating given. And while this may be the way Yelp defines their star ratings, text reviews can demostrate how the two may not always align.
Now that we know that the star ratings mean, I scraped data from the webpage and found that Dana’s scores a 4/5 stars. It seems that some people are indeed fans!
Certain reviews are highlighted and sepaaraated from the rest when they include words that others have also used in reviews for ssaid establishment. I wanted to scrape this page to find what the keywards were and how often they were used.
I was also interested to the location distribution of where Dana’s reviewers were from. To no ones surprise, the majority of reviewers were from Cincinnati, but it was interessting to see the other cities and states that were represented.
Yelp utilizes this section of the webpage to point out the important aspects about a restaurant or bar that customers would like to know. For Dana’s, Yelp picked out: * No Reservations * Accepts Credit Cards * Outdoor Seating * Divey
Through this webscraping, I was able to find out the definitions of star ratings, Dana’s star rating, analyze review keyword usage, analyze reviewers’ locations and see the amenities Yelp deemed as most important for customers to know.
While this was a fun exercise, there is more that I would have liked to have done with this HTML scrape. It was very difficult to just pull out the text reviews, but I would have liked to have done some text analysis on the reviews in the attenpt to quantify the frequency and tone words were used in. Also, I ran into difficulty pulling the star rating for each review as it is categorized as an image. Ideally, I would like to solve both of these barriers and conduct analysis between the tone of language used to the star rating associated with the review.