This final independent analysis represented a culmination of my curiosity about how businesses on Yelp responded to the initial COVID-19 pandemic lockdown. Join me as I examine, for one last time, a curious cross-section of time when the world at large was just coming to terms with a world grinding to a halt in order to “flatten the curve,” as it was fashionable to say at the time.
My enduring dataset for this semester has been a collection of public Yelp data that included the platform’s accommodations to business owners during this initial lockdown. Like Yelp’s annually published data, these business listings were drawn from “the metropolitan areas centered on Montreal, Calgary, Toronto, Pittsburgh, Charlotte, Urbana-Champaign, Phoenix, Las Vegas, Madison, and Cleveland” (Yelp.com, 2022).
There were many variables in this dataset, but I became most interested in their “covid banner”, a text box at the top of a listing that businesses could use for sharing information regarding the pandemic and how they were responding to it. Unlike many of the other covid features (e.g. a badge that indicated if they were offering curbside service), this was a completely open format, leading to a variety of uses. My previous analyses explored various aspects of this covid banner and its utility.
Initially, my final project was to be a synthesis of all the text mining techniques we’ve done over the semester, but I decided to focus on sentiment analysis. I have been fascinated by the affordances and limitations of codifying text into basic emotions and affects, and enjoyed applying this methodology to such an extraordinary time in human history. How can you essentialize mass communications about a pandemic, after all? Perhaps through Yelp’s generous (and plentiful, with thousands of entries) contribution I would be able to take a glimpse. To this end, my research question for this final analysis was:
What are the qualities and influencing factors to a covid banner’s positive or negative sentiment?
I believe these findings would be of interest to Yelp and business owners alike because they might provide valuable advertising and marketing insight; assuming businesses generally strive to appear more positive–and, therefore, inviting-to customers, they would be motivated to better steer the sentiment of their communications in that direction.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.8
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidytext)
library(textdata)
library(readxl)
library(wordcloud)
## Loading required package: RColorBrewer
library(wordcloud2)
remotes::install_github("lchiffon/wordcloud2")
## Skipping install of 'wordcloud2' from a github remote, the SHA1 (8a12a3b6) has not changed since last install.
## Use `force = TRUE` to force installation
I then imported my Yelp banner data from the original .xlsx file. When sampling I noticed that there were duplicate instances of banner tert with unique business IDs–I eventually realized that these were franchised accounts that used canned responses across all of their listings. I used distinct() to screen for these, as they might throw off my token counts.
set.seed(588)
banners_raw <- read_xlsx("data/yelp-covid-dataset-banners.xlsx")|>
select(c(business_id, covid_banner)) |>
distinct(covid_banner, .keep_all = TRUE) |>
sample_n(200)
I created a small enough sample that I chose not to stem the text in order to see a wider variety of entries. I wasn’t sure how some words might lose their fidelity during sentiment analysis due to the stemming process.
banners_tidy <- banners_raw |>
unnest_tokens(output = word,
input = covid_banner) |>
anti_join(stop_words, by = "word")
I looked at the the most frequent tokens to see if I needed to filter any specific terms. I omitted terms related to COVID-19 as they probably wouldn’t show up after sentiment analysis anyways, and could dominate the pre-sentiment word cloud. I also screened for a few artifacts from the importing process.
count(banners_tidy, word, sort = TRUE) |>
view()
banners_tidy <- banners_tidy |>
filter(word != "covid") |>
filter(word != "19") |>
filter(word != "covid-19")
I started with a normal word cloud to make sure my sample was on-par with my previous samples:
banners_top_n <- banners_tidy |>
count(word, sort = TRUE) |>
slice(1:50)
wordcloud2(banners_top_n)
As before, this sample appeared to be an assortment of safety announcements and logistical changes across the Yelp businesses. But I wanted to go further to see how this would stack up to sentiment analysis.
I began with the afinn lexicon because it gives each token a value between -5 and 5 to indicate positive or negative sentiment. I originally thought this would be useful for vectored visualizations and ended up using the ordinal data to make rankings. I paired all relevant tokens with afinn, grouped those terms by each covid banner, and added up those tokens to give an overview of each banner’s net sentiment:
afinn <- get_sentiments("afinn")
sentiment_afinn <- inner_join(banners_tidy, afinn, by = "word")
summary_afinn <- sentiment_afinn |>
group_by(business_id) |>
summarise(sentiment = sum(value))
summary_afinn
## # A tibble: 133 × 2
## business_id sentiment
## <chr> <dbl>
## 1 _3PAk1If6bfnqG9W1W52TA 1
## 2 _jhWVxhU0WfJl_Wb1OdcUg 7
## 3 _NbvuglnRM0ifUHFDvIlbw 2
## 4 -1VaIJza42Hjev6ukacCNg 3
## 5 01VEnC_OYatfDgTdgpJ_LA 1
## 6 1KSqLKRQNMONRN6NNzwR2w -1
## 7 1L8QhmM-0dhUHw7sWntVaA 7
## 8 1RNYRozXKZqKtR9iezmCog 4
## 9 1xB0Qf2-TN6CQ5c4Fh7vqg 7
## 10 2MOaiezPXzeyd6e3WXk8xw -2
## # … with 123 more rows
It looked good, but I was most interested in the most positive and negative banners to look for any meaningful qualities. I did this by merging the original banner text to this sentiment summary, then arranging banners from positive to negative:
summary_afinn <- summary_afinn |>
merge(banners_raw, by = "business_id") |>
arrange(desc(sentiment))
head(summary_afinn)
## business_id sentiment
## 1 k8pmQPO1laT2GNhJhpD5OQ 14
## 2 2V-TzWecvR7OxP6paCWhfA 13
## 3 C1M2P8UDv1fJZ9Q8COp9Ew 11
## 4 w0Tg08rY5jl6rycI0qT1NA 9
## 5 EU_AkRZa27aR_b2r1WfCfQ 8
## 6 JTVrdJRkr2ZILuFVY8fMkQ 8
## covid_banner
## 1 Here at Zippy Entertainment, our commitment with our customers is to always offer the best quality services and give you an experience that will invite you to keep coming back to us and become a loyal customer. It is our commitment as a business, to keep providing you with ton of joy and happiness with our balloon services, interactive costume character appearances, magic shows and face painting services.\r\r\n\r\r\nWe're still doing our virtual balloon classes for a small fee. If you're needing balloon supplies, we can assist you for a small fee getting these supplies to you. It can take up to 2 weeks or more to get the supplies to you.\r\r\n \r\r\nIn addition to our virtual services, we do offer special appearances with our costumes such as the Easter Bunny, Santa Claus and other special characters.\r\r\n \r\r\nDue to the ongoing precautionary measures, please contact the business directly by filling out our form located at www.zippyisfunny/contact-us to book us today!
## 2 We are OPEN! As we navigate this unprecedented time, in addition to free in-home consultations, we offer free virtual consultations as well! Thanks to our incredible CAD system and virtual technology, Space Solutions can work with you in the same manner as we would if we came to your home.Then we will create a beautiful 3D CAD design that will come to life online so you can see exactly what your new custom design will look like! We will schedule your installation at a date that's convenient for you and we'll double-check all the measurements and details before we proceed just to make sure we've got it right. Upon installation, we will continue to take the necessary precautions for you and our team to ensure a safe and clean outcome. We'd love to work with you on your home organization projects including garage cabinetry, custom closets, home offices, pantries, laundry rooms, entertainment centers, craft rooms, lockers, kids spaces and more! 602-298-6956 or www.spacesolutionsaz.com
## 3 It has been a challenging time for the entire community as we navigate the new normal. Although in-person patient care remains paused for massage, chiro and physio, over the coming week we will be introducing Telerehab as an option for existing chiro and physio patients to ensure continuity in care. \r\r\n\r\r\nThese sessions will be held via email, phone or virtually. As we adapt ways we can provide chiro and physio services, we anticipate a few hiccups. Thank you for your continued trust and patience as we work to better meet your needs.\r\r\n\r\r\nAdditionally, our Registered Dietician, Rory Hornstein continues to consult remotely through the clinic. \r\r\n\r\r\nOur hope is that we can help patients and work together providing home therapies and exercises to assist you over the coming months until our professional associations along with the province develop protocols that best protect our patients, staff and therapists to treat in-person again. \r\r\n\r\r\nPlease don't hesitate to email us or leave a voice message at the clinic as our clinic admin team continues to work remotely. Thank you to everyone who has supported the clinic since we opened our doors in 2002, we look forward to seeing you in person soon. \r\r\n\r\r\nStay safe and stay well,\r\r\n\r\r\nYour health care team at Back & Body Health
## 4 WE'RE OPEN, with expanded health and safety guidelines to protect our members and staff. These changes include even more stringent policies on cleaning and disinfecting, required use of protective equipment for staff and members, and limited class capacity to ensure social distancing. Please visit the Orangetheory website for a comprehensive list of the new health and safety precautions that have been implemented in our studio. Orangetheory is a science-backed, technology-tracked, coach-inspired group workout designed to produce results from the inside out. The hardest part of our workouts is showing up -- we make it simple for you to push yourself to be your personal best, and we give you more. MORE results. MORE confidence. MORE Life.
## 5 WDMC is continuing to operate during the Covid-19 pandemic. Our team is working remotely; our offices are currently closed. We are focused on health, safety & providing solutions while following CDC recommendations to reduce the spread of the virus. Mail is being processed, payments are being applied & your vendors are being paid.\r\r\n\r\r\nTo contact us:\r\r\n\r\r\nEmail: HOA@wmdouglas.com - Please include HOA name & address.\r\r\n\r\r\nChat: Go to our website www.wmdouglas.com - chat box will open automatically.\r\r\n\r\r\nPhone: Please leave a voicemail, it will be returned as quickly as possible.\r\r\n\r\r\nWe ask for your patience. Please understand that response times may be longer than normal.\r\r\n\r\r\nPayments can be made online at www.wmdouglas.com. ACH drafts will continue as scheduled. You may mail your dues coupon slip & payment to: POBox 1208, Commerce, GA 30529.\r\r\n\r\r\nMaintenance requests can be submitted on the website & will be routed to your Manager.\r\r\n\r\r\nThank you & we hope everyone stays healthy & safe during this time.
## 6 The Spay Neuter Clinic is still open during our regular business hours to care for the health of your pets. Caring for your pets has always been our top priority, but your health is important to us as well. Our team is taking additional steps to keep both our pets and people safe in our clinics. We have always prided ourselves on cleanliness and use medical disinfectant throughout the clinic. Considering the pandemic, we have increased the disinfection frequency, especially in areas clients frequent. We have also taken steps to decrease the interaction between clients. We have temporarily changed the process of our check-ins effective immediately. After signing in you will be asked to wait outside or in your car. A team member will call you when it is your turn to be seen. We truly thank you for entrusting Spay Neuter Clinic to serve you and your pets. We are all in this together.
tail(summary_afinn)
## business_id sentiment
## 130 MtDvOGVPQy67u7kkbuK-aw -2
## 131 OyKv_5-0jo2D2tGIiB90HA -2
## 132 3jKUbhGSjFTv5jZ0wnW0xA -4
## 133 wr8_zkY2XzHKVs1SeFqypA -4
## 134 ndIfNpamzCR8lxOe6jOyPA -6
## 135 PY_M-nphtb58F25v1R8cnQ -6
## covid_banner
## 130 Although our Florida locations are temporarily closed, all other stores are open (some with modified hours of operation) for curbside takeout and delivery during the Covid-19 situation. We are taking all measures and precautions during this difficult time to ensure the highest form of safety for our customers and staff. We can't thank you enough for your continued support in our community. Stay safe and thank you for supporting Bad Ass Coffee of Hawaii.
## 131 Our business hours will change from 4pm to 12:30 am \r\r\nOn 03/30/2020 until further notice. \r\r\nSorry for the inconvenient
## 132 L’Équipe "Les Deux Gamins" est de tout coeur avec notre communauté locale, voilà pourquoi nous avons élaboré un menu composé de délicieux plats à commander à prix très compétitifs que vous pouvez préparer chez vous po\r\r\nur vous et votre famille.\r\r\n \r\r\nNous avons réduis notre personel à 2 personnes, et sommes en mesure de répondre aux commandes à emporter ou en livraison locale. \r\r\nNous, personnel de la restauration sommes déjà soumis aux règles strictes de la MAPAQ, et doublons de vigilance pour appliquer des règles d'hygiène supplémentaires en ces temps particuliers (les poignées, machine interac, et toute surface partagée sont désinfectées entre chaque client).\r\r\nPour éviter les pertes, nous prenons les commandes en avance. Le jour de livraison, nous demandons aux gens de pré-payer par carte de crédit pour éviter le contact.\r\r\nTous nos aliments seront offerts sous vide, pour faciliter le transport et la conservation.\r\r\n Nous sommes fiers de pouvoir desservir notre quartier pendant ce temps difficile et tout comme vous, nous avons hâte de pouvoir retrouver nos amis, nos clients, nos employés, nos voisins et bien sur la ré-ouverture de notre terrasse!!\r\r\nD’ici là , n’hésitez pas à nous faire parvenir vos commandes et visitez notre Facebook pour connaitre nos plats offerts, et pour nous donner vos suggestions et commentaires.\r\r\n \r\r\nSalutations de la part de L’equipe Les Deux Gamins.
## 133 We will be open but with only two of us working at a time. You can bring your dog into the lobby if you are wearing a mask, if not We will take your dog from your car or outside and return him to you there. Please observe social distancing ! Wait outside if another patron is in my small lobby .
## 134 We are not taking on new clients at this time that are not M-F. Also, don't use YELP. They do not protect their small, family-run business from fake reviews that are politically motivated attacks against private individuals that have never serviced the fake poster. Yelp does not abide by it's policies or guidelines, putting small businesses' at risk.
## 135 We are available during these times of the Covid-19. We will show up wearing booties, masks and rubber gloves. However, if there is ANYONE in your home sick with a cold, flu symptoms or any contagious illnesses. We regret to say that we will not enter the home to perform services. Thank you for your understanding. Stay Safe and Healthy.
The most positive banners had more florid language about safety, security, quality of care and conscientiousness. By contrast, it appeared that the most negative banners generally used apologetic terminology (e.g. “sorry,” “unfortunately”) to seem deferential to the audience, but this came across as negative to afinn. There was also that French listing, which probably came from a Montreal business. I found it interesting that French tokens would work with afinn at all, even considering English-French cognates, let alone come out to be more negative in sentiment, since afinn only works in English, German and emoji.
It also looked like the more positive banners tended to be longer than the more negative banners. It was at this point that I began to notice a potential correlation between banner length and sentiment and decided to follow up with a regression analysis. But first, I wanted to see how the original word cloud would look with the additional sentiment data (for each token, rather than by banner).
Although some tokens could change due to not having an afinn listing, I thought a sentiment word cloud would provide an interesting look at the relative proportions of positive and negative language across my entire sample.
To do this, I first needed to consolidate identical tokens from different listings into a single count so it wouldn’t show up multiple times in the same cloud. I split the original afinn output into positive and negative dataframes, re-counted the tokens to consolidate them, then ran afinn again:
#positive selection & consolidation
banners_positive <- sentiment_afinn |>
select(word, value) |>
filter(value > 0) |>
count(word, sort = TRUE)
#re-sentiment positive
banners_positive <- banners_positive |>
inner_join(afinn, by = "word") |>
view()
#negative selection & consolidation
banners_negative <- sentiment_afinn |>
select(word, value) |>
filter(value < 0) |>
count(word, sort = TRUE)
#re-sentiment negative
banners_negative <- banners_negative |>
inner_join(afinn, by = "word") |>
view()
#unite frames
banners_compare <- rbind(banners_negative, banners_positive)
view(banners_compare)
The resulting data was ready for the new word cloud, this time with color that corresponded to sentiment (red for negative, green for positive):
n = nrow(banners_compare)
colors = rep("grey", n)
colors[banners_compare$value <= 0] = "Red"
colors[banners_compare$value > 0] = "Green"
wordcloud2(banners_compare, color = colors)
Safe and safety were the clearly predominant terms, and there were more positive than negative terms overall. I noticed that some negative terms were unusual, such as “lobby.” It’s possible this lexicon considered lobby as in “formally pushing a political interest” rather than “waiting area just inside the entrance of a business.” What did the bing lexicon have to say?
One other term that was coded as negative was “mandatory” which got me thinking about how assertiveness or firmness in language might be construed as “negative” but that this might be culturally informed. Would the same thing come across negatively in Sweden or Taiwan? This, again, goes back to the essentialization of mass communication that got me interested in this research in the first place.
I also wanted to make a comparison word cloud that directly visualized the relative frequencies of positive versus negative sentiment for comparison. This time I used the bing lexicon because it only offered two options, rather than the ordinal data from afinn: Positive or negative.
#re-tidy data by tokenizing and filtering
sentiment_cloud <- banners_raw |>
select(business_id,covid_banner) |>
unnest_tokens(output=word,input=covid_banner) |>
anti_join(stop_words) |>
#bing sentiment attribution
inner_join(get_sentiments('bing')) |>
count(sentiment,word,sort=T)
## Joining, by = "word"
## Joining, by = "word"
#create term document matrix
sentiment_matrix <- sentiment_cloud |>
spread(key=sentiment,value = n,fill=0) |>
data.frame()
rownames(sentiment_matrix) = sentiment_matrix[,'word']
sentiment_matrix = sentiment_matrix[,c('positive','negative')]
#generate comparison cloud
set.seed(588)
comparison.cloud(term.matrix = sentiment_matrix ,scale = c(2,0.5), max.words = 200, rot.per=0)
You can see how some of the negative terms, while arguably candid about the nature of the pandemic at the time (e.g. “crisis”, “inconvenience”, “concern”, etc.), could be considered negative. To me, this highlights a key tension for businesses as they addressed their respective audiences: Honesty versus approachability.
Also, a quick frequency count corroborated with afinn that there were indeed more positive than negative terms overall:
ggplot(sentiment_cloud, aes(x = sentiment)) +
geom_bar() +
theme_minimal()
Inspired by my earlier exploration, I finally examined the relationship between the length of a covid banner and its net sentiment. In this case I defined post length by character count rather than word or token count. This was a relatively simple matter of using mutate() and nchar() to give each banner a length variable, in addition to the afinn net sentiment score. I then ran a regression of post length on sentiment:
#length variable
banners_plot <- summary_afinn |>
mutate(length = nchar(covid_banner))
#regression analysis
reg <- lm(sentiment ~ length, data = banners_plot)
summary(reg)
##
## Call:
## lm(formula = sentiment ~ length, data = banners_plot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.707 -1.984 0.022 1.676 8.665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5531412 0.4326337 1.279 0.203
## length 0.0049405 0.0009249 5.342 3.88e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.978 on 133 degrees of freedom
## Multiple R-squared: 0.1766, Adjusted R-squared: 0.1704
## F-statistic: 28.53 on 1 and 133 DF, p-value: 3.878e-07
It appeared that post length has a significant influence on sentiment (p < .001). For approximately eveyr additional 1,000 characters, a post’s sentiment would become 1 unit more positive according to afinn. Below is a scatter plot with the linear regression:
ggplot(data = banners_plot, mapping = aes(x = length, y = sentiment)) +
geom_point() +
geom_abline(intercept = 0.435775, slope = 0.007036, color = "red")
The original research question for this analysis was:
What are the qualities and influencing factors to a covid banner’s sentiment?
It appears that post length has a significant impact on positivity, but it’s still unclear as to why. Considering that there were more positive terms than negative overall, could it be that the longer a business spends talking to their audience, the more likely they are to include positively-associated terms that pushes their net sentiment over into the positive “zone?” Or were businesses that took the time to write a longer post in the first place also more likely to speak in positive language to their clientele? Perhaps with a larger sample I could visualize the full distributions of these tokens, to see where the means, medians and modes for these terms actually resulted in the net scores that I used for my afinn analysis.
If I were to have done this over, I would have also attempted to visualize and analyze other kinds of sentiment using libraries such as nrc for a higher-resolution look at Yelp during a time of covid.
This data is not without its limitations. As I’ve mentioned in previous analyses, it could have been much more with the businesses identities intact. The covid banners often included business names (which does raise some questions about the ethics of this published data) but there was no means of completely reclaiming the identities of a random sample. Access to these listings could have opened the door to more longitudinal inquiries (e.g. Were businesses with negative covid banners more likely to close within x amount of time?) or a means of concretely assigning business types (e.g. restaurant, law firm, etc.) to look for covariations and confounds.
In the meantime, I have one simple recommendation to businesses interested in coming across as more positive during the pandemic: Use as many words are you are allowed! Also, consider the balance of addressing the obvious challenges and tribulations that come with a global crisis (including using the word “crisis”) with how it might affect the tone of your overall message. The truth can be a bitter pill to swallow, and it’s up to businesses to watch their diction to help it go down easy enough that their customers still want to come back.
References:
Yelp. (2022, May 1). Yelp Dataset. Yelp.com. https://www.yelp.com/dataset/documentation/faq