In this page, I will be conducting sentiment analysis on yelp reviews from 3 different Skyline locations in the Cincinnati area. I will be comparing the different Skyline locations to see if there is a better location to visit based on these reviews.
I choose the Clifton, Downtown, and Oakley locations because they are all in my area and they all had over 100 reviews. I used the following yelp links to scrape my data:
Below are the packages I will use to manipulate the data.
Data Preparation and Review
For each set of data, I will create a column identifying where the review came from so I can combine the data frames into one large data frame.
Next I will create a usable date column for later analysis. I will also create a review id so each review is uniquely identified.
Now I will separate out the words in the review and remove stop words like “the” so I can analyze the meaningful words of the reviews.
Before I move onto the real analysis, I want to review some word pairs so I know what to expect and can ask better questions of the data.
Trends and Analysis
Does one Skyline provide a better experience than others?
Using the NRC lexicon, I will create a car chart of the different sentiment groupings for each location.
Based on the bar chart, all locations have roughly similar atmospheres; however, Clifton has more positive words than the other locations. Additionally, what I view as more negative sentiments (fear, disgust, anger, and sadness) have significantly less words than the more positive sentiments (joy, positive, and trust).
Has the average 5-star review decreased over the past year?
I will be filtering the data to only include reviews from the past year. I will then summarize this with an average rating per month which I will plot on a line chart.
Based on this line chart, we can see that no location has a steady average rating over the past year. Oakley does have a generally increasing trend, which could suggest that this location has been improving. This data however could be heavily skewed by a lack of reviews in a certain month. For example, there are no reviews for the Downtown location in November 2024.
Which part of a 3-way is mentioned most often?
There are 3 elements of a 3-way, chili, cheese, and spaghetti. I want to know the most talked about part of this dish.
I suspected that chili was going to appear the most amount of times because it is Skyline’s most popular or controversial item that goes on every dish, not just the 3-way. Also based on word counts, it makes sense why Clifton has more occurrences.
Source Code
---title: "Skyline Reviews Comparison"subtitle: "Assignment 7"author: "Meredith Briggs"format: html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: FALSE # TRUE: Show all code in the output.editor: visual---## Intro to Skyline ReviewsIn this page, I will be conducting sentiment analysis on yelp reviews from 3 different Skyline locations in the Cincinnati area. I will be comparing the different Skyline locations to see if there is a better location to visit based on these reviews.I choose the Clifton, Downtown, and Oakley locations because they are all in my area and they all had over 100 reviews. I used the following yelp links to scrape my data:Skyline Clifton: <https://www.yelp.com/biz/skyline-chili-cincinnati-8>Skyline Downtown: <https://www.yelp.com/biz/skyline-chili-cincinnati-34>Skyline Oakley: <https://www.yelp.com/biz/skyline-chili-cincinnati-12>Below are the packages I will use to manipulate the data.```{r}#| label: setup#| include: TRUE#| echo: FALSElibrary(tidyverse) # All the tidy thingslibrary(lubridate) # Easily fixing pesky dateslibrary(tidytext) # Tidy text mininglibrary(textdata) # Lexicons of sentiment datalibrary(widyr) # Easily calculating pairwise countslibrary(igraph) # Special graphs for network analysislibrary(ggraph) # An extension of ggplot for relational data used in networkslibrary(knitr) # Useful tools when 'knitting' (rendering) Quarto documents.``````{r}#| label: loading skyline data#| include: FALSE#| echo: FALSEclifton_skyline <-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/briggsm4_xavier_edu/Ea8Bd5361fxBpDtSi6TB1dMBT57ForAzYW3K6gsLCAnjGQ?download=1")downtown_skyline <-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/briggsm4_xavier_edu/EddijPrAFbFLhzCEhlcUWRUBKl3FKERmKLjdAutTS4HbNw?download=1")oakley_skyline <-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/briggsm4_xavier_edu/ESzF3aOixcNCsVKGDZL4WT0BtGexOMD85nt66yEpAtGQbA?download=1")```## Data Preparation and ReviewFor each set of data, I will create a column identifying where the review came from so I can combine the data frames into one large data frame.```{r}#| label: creating location variable#| include: FALSE#| echo: FALSEclifton_skyline$location <-"clifton"downtown_skyline$location <-"downtown"oakley_skyline$location <-"oakley"``````{r}#| label: creating large data frame#| include: FALSE#| echo: FALSEskyline_reviews <-bind_rows(clifton_skyline, downtown_skyline, oakley_skyline)```Next I will create a usable date column for later analysis. I will also create a review id so each review is uniquely identified.```{r}#| label: creating dates#| include: FALSE#| echo: FALSEskyline_reviews$date <-mdy(skyline_reviews$review_date)skyline_reviews <- skyline_reviews %>%arrange(date) %>%mutate(review_id =row_number())```Now I will separate out the words in the review and remove stop words like "the" so I can analyze the meaningful words of the reviews.```{r}#| label: create usable date#| include: FALSE#| echo: FALSEtidy_skyline_reviews <- skyline_reviews %>%unnest_tokens(word, review_content) %>%anti_join(stop_words)```Before I move onto the real analysis, I want to review some word pairs so I know what to expect and can ask better questions of the data.```{r}#| label: create word pairs#| include: FALSE#| echo: FALSEreview_word_pairs <- tidy_skyline_reviews %>%group_by(location) %>%pairwise_count(item = word, feature = review_id,upper =FALSE) %>%arrange(-n)view(review_word_pairs)```## Trends and Analysis### Does one Skyline provide a better experience than others?Using the NRC lexicon, I will create a car chart of the different sentiment groupings for each location.```{r}#| label: positivity#| include: TRUE#| echo: FALSEnrc <-get_sentiments("nrc")tidy_skyline_reviews %>%inner_join(nrc, by ="word", relationship ="many-to-many") %>%group_by(sentiment, location) %>%summarize(n =n()) %>%ggplot(aes(x = sentiment, y = n, fill = location)) +geom_bar(stat ="identity", position ="dodge") +scale_fill_manual(values =c("blue","yellow", "red")) +theme(axis.text.x =element_text(angle =45, hjust =1))+labs(title ="Skyline Sentiment Scores",subtitle ="Total number of emotive words scored ",y ="Total Number of Words",x ="Emotional Sentiment",fill ="Location")```Based on the bar chart, all locations have roughly similar atmospheres; however, Clifton has more positive words than the other locations. Additionally, what I view as more negative sentiments (fear, disgust, anger, and sadness) have significantly less words than the more positive sentiments (joy, positive, and trust).### Has the average 5-star review decreased over the past year?I will be filtering the data to only include reviews from the past year. I will then summarize this with an average rating per month which I will plot on a line chart.```{r}#| label: ratings over the past year#| include: TRUE#| echo: FALSEskyline_reviews %>%filter(date >=Sys.Date() -years(1)) %>%mutate(year_month =floor_date(date, "month")) %>%group_by(year_month, location) %>%summarize(avg_rating =mean(review_rating, na.rm =TRUE)) %>%ggplot(aes(x = year_month, y = avg_rating, color = location)) +geom_line(size =1) +geom_point(size =2) +scale_color_manual(values =c("blue", "yellow", "red")) +labs(title ="Average 5-Star Rating by Location Over Past Year",x ="Date", y ="Average Rating", color ="Location") +theme_minimal()```Based on this line chart, we can see that no location has a steady average rating over the past year. Oakley does have a generally increasing trend, which could suggest that this location has been improving. This data however could be heavily skewed by a lack of reviews in a certain month. For example, there are no reviews for the Downtown location in November 2024.### Which part of a 3-way is mentioned most often?There are 3 elements of a 3-way, chili, cheese, and spaghetti. I want to know the most talked about part of this dish.```{r}#| label: elements of 3-way#| include: TRUE#| echo: FALSEtidy_skyline_reviews %>%filter(word %in%c("chili","cheese","spaghetti"))%>%group_by(word, location) %>%summarize(count =n()) %>%ggplot(aes(x=word, y=count, fill=location))+geom_bar(stat ="identity", position ="dodge") +scale_fill_manual(values =c("blue","yellow", "red")) +labs(title ="Elements of a 3-Way per location",y ="Amountof times word appeared",x ="Elements of a 3-Way",fill ="Location")```I suspected that chili was going to appear the most amount of times because it is Skyline's most popular or controversial item that goes on every dish, not just the 3-way. Also based on word counts, it makes sense why Clifton has more occurrences.