Skyline Reviews Comparison

Assignment 7

Author

Meredith Briggs

Intro to Skyline Reviews

In this page, I will be conducting sentiment analysis on yelp reviews from 3 different Skyline locations in the Cincinnati area. I will be comparing the different Skyline locations to see if there is a better location to visit based on these reviews.

I choose the Clifton, Downtown, and Oakley locations because they are all in my area and they all had over 100 reviews. I used the following yelp links to scrape my data:

Skyline Clifton: https://www.yelp.com/biz/skyline-chili-cincinnati-8

Skyline Downtown: https://www.yelp.com/biz/skyline-chili-cincinnati-34

Skyline Oakley: https://www.yelp.com/biz/skyline-chili-cincinnati-12

Below are the packages I will use to manipulate the data.

Data Preparation and Review

For each set of data, I will create a column identifying where the review came from so I can combine the data frames into one large data frame.

Next I will create a usable date column for later analysis. I will also create a review id so each review is uniquely identified.

Now I will separate out the words in the review and remove stop words like “the” so I can analyze the meaningful words of the reviews.

Before I move onto the real analysis, I want to review some word pairs so I know what to expect and can ask better questions of the data.

Trends and Analysis

Does one Skyline provide a better experience than others?

Using the NRC lexicon, I will create a car chart of the different sentiment groupings for each location.

Based on the bar chart, all locations have roughly similar atmospheres; however, Clifton has more positive words than the other locations. Additionally, what I view as more negative sentiments (fear, disgust, anger, and sadness) have significantly less words than the more positive sentiments (joy, positive, and trust).

Has the average 5-star review decreased over the past year?

I will be filtering the data to only include reviews from the past year. I will then summarize this with an average rating per month which I will plot on a line chart.

Based on this line chart, we can see that no location has a steady average rating over the past year. Oakley does have a generally increasing trend, which could suggest that this location has been improving. This data however could be heavily skewed by a lack of reviews in a certain month. For example, there are no reviews for the Downtown location in November 2024.

Which part of a 3-way is mentioned most often?

There are 3 elements of a 3-way, chili, cheese, and spaghetti. I want to know the most talked about part of this dish.

I suspected that chili was going to appear the most amount of times because it is Skyline’s most popular or controversial item that goes on every dish, not just the 3-way. Also based on word counts, it makes sense why Clifton has more occurrences.

--- title: "Skyline Reviews Comparison" subtitle: "Assignment 7" author: "Meredith Briggs" format: html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: FALSE # TRUE: Show all code in the output. editor: visual --- ## Intro to Skyline Reviews In this page, I will be conducting sentiment analysis on yelp reviews from 3 different Skyline locations in the Cincinnati area. I will be comparing the different Skyline locations to see if there is a better location to visit based on these reviews. I choose the Clifton, Downtown, and Oakley locations because they are all in my area and they all had over 100 reviews. I used the following yelp links to scrape my data: Skyline Clifton: <https://www.yelp.com/biz/skyline-chili-cincinnati-8> Skyline Downtown: <https://www.yelp.com/biz/skyline-chili-cincinnati-34> Skyline Oakley: <https://www.yelp.com/biz/skyline-chili-cincinnati-12> Below are the packages I will use to manipulate the data. ```{r} #| label: setup #| include: TRUE #| echo: FALSE library(tidyverse) # All the tidy things library(lubridate) # Easily fixing pesky dates library(tidytext) # Tidy text mining library(textdata) # Lexicons of sentiment data library(widyr) # Easily calculating pairwise counts library(igraph) # Special graphs for network analysis library(ggraph) # An extension of ggplot for relational data used in networks library(knitr) # Useful tools when 'knitting' (rendering) Quarto documents. ``` ```{r} #| label: loading skyline data #| include: FALSE #| echo: FALSE clifton_skyline <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/briggsm4_xavier_edu/Ea8Bd5361fxBpDtSi6TB1dMBT57ForAzYW3K6gsLCAnjGQ?download=1") downtown_skyline <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/briggsm4_xavier_edu/EddijPrAFbFLhzCEhlcUWRUBKl3FKERmKLjdAutTS4HbNw?download=1") oakley_skyline <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/briggsm4_xavier_edu/ESzF3aOixcNCsVKGDZL4WT0BtGexOMD85nt66yEpAtGQbA?download=1") ``` ## Data Preparation and Review For each set of data, I will create a column identifying where the review came from so I can combine the data frames into one large data frame. ```{r} #| label: creating location variable #| include: FALSE #| echo: FALSE clifton_skyline$location <- "clifton" downtown_skyline$location <- "downtown" oakley_skyline$location <- "oakley" ``` ```{r} #| label: creating large data frame #| include: FALSE #| echo: FALSE skyline_reviews <- bind_rows(clifton_skyline, downtown_skyline, oakley_skyline) ``` Next I will create a usable date column for later analysis. I will also create a review id so each review is uniquely identified. ```{r} #| label: creating dates #| include: FALSE #| echo: FALSE skyline_reviews$date <- mdy(skyline_reviews$review_date) skyline_reviews <- skyline_reviews %>% arrange(date) %>% mutate(review_id = row_number()) ``` Now I will separate out the words in the review and remove stop words like "the" so I can analyze the meaningful words of the reviews. ```{r} #| label: create usable date #| include: FALSE #| echo: FALSE tidy_skyline_reviews <- skyline_reviews %>% unnest_tokens(word, review_content) %>% anti_join(stop_words) ``` Before I move onto the real analysis, I want to review some word pairs so I know what to expect and can ask better questions of the data. ```{r} #| label: create word pairs #| include: FALSE #| echo: FALSE review_word_pairs <- tidy_skyline_reviews %>% group_by(location) %>% pairwise_count(item = word, feature = review_id, upper = FALSE) %>% arrange(-n) view(review_word_pairs) ``` ## Trends and Analysis ### Does one Skyline provide a better experience than others? Using the NRC lexicon, I will create a car chart of the different sentiment groupings for each location. ```{r} #| label: positivity #| include: TRUE #| echo: FALSE nrc <- get_sentiments("nrc") tidy_skyline_reviews %>% inner_join(nrc, by = "word", relationship = "many-to-many") %>% group_by(sentiment, location) %>% summarize(n = n()) %>% ggplot(aes(x = sentiment, y = n, fill = location)) + geom_bar(stat = "identity", position = "dodge") + scale_fill_manual(values = c("blue","yellow", "red")) + theme(axis.text.x = element_text(angle = 45, hjust = 1))+ labs(title = "Skyline Sentiment Scores", subtitle = "Total number of emotive words scored ", y = "Total Number of Words", x = "Emotional Sentiment", fill = "Location") ``` Based on the bar chart, all locations have roughly similar atmospheres; however, Clifton has more positive words than the other locations. Additionally, what I view as more negative sentiments (fear, disgust, anger, and sadness) have significantly less words than the more positive sentiments (joy, positive, and trust). ### Has the average 5-star review decreased over the past year? I will be filtering the data to only include reviews from the past year. I will then summarize this with an average rating per month which I will plot on a line chart. ```{r} #| label: ratings over the past year #| include: TRUE #| echo: FALSE skyline_reviews %>% filter(date >= Sys.Date() - years(1)) %>% mutate(year_month = floor_date(date, "month")) %>% group_by(year_month, location) %>% summarize(avg_rating = mean(review_rating, na.rm = TRUE)) %>% ggplot(aes(x = year_month, y = avg_rating, color = location)) + geom_line(size = 1) + geom_point(size = 2) + scale_color_manual(values = c("blue", "yellow", "red")) + labs(title = "Average 5-Star Rating by Location Over Past Year", x = "Date", y = "Average Rating", color = "Location") + theme_minimal() ``` Based on this line chart, we can see that no location has a steady average rating over the past year. Oakley does have a generally increasing trend, which could suggest that this location has been improving. This data however could be heavily skewed by a lack of reviews in a certain month. For example, there are no reviews for the Downtown location in November 2024. ### Which part of a 3-way is mentioned most often? There are 3 elements of a 3-way, chili, cheese, and spaghetti. I want to know the most talked about part of this dish. ```{r} #| label: elements of 3-way #| include: TRUE #| echo: FALSE tidy_skyline_reviews %>% filter(word %in% c("chili","cheese","spaghetti"))%>% group_by(word, location) %>% summarize(count = n()) %>% ggplot(aes(x=word, y=count, fill=location))+ geom_bar(stat = "identity", position = "dodge") + scale_fill_manual(values = c("blue","yellow", "red")) + labs(title = "Elements of a 3-Way per location", y = "Amountof times word appeared", x = "Elements of a 3-Way", fill = "Location") ``` I suspected that chili was going to appear the most amount of times because it is Skyline's most popular or controversial item that goes on every dish, not just the 3-way. Also based on word counts, it makes sense why Clifton has more occurrences.