Objective

Below, we will do some very simple strategies in analyzing consumer reviews for roughly 80 Walmart products with the ultimate goal of measuring product level pricing sentiment scores on a daily basis.

Written customer comments and reviews on company websites can be a genuine expression of their sentiment on various product attributes. Retailer websites provide a wealth of information related to this, and with some simple text analytics, we can evaluate:
- How customers feel about a product’s price over time?
- What customer think about product quality, functionality, etc.?
- How recent shifts in pricing strategy influenced consumer price perception?
- How recent shifts in advertising strategy influenced consumer perception of quality?
- How to segment products based on text mining outcomes?
- What pricing strategies to further pursue based on our text analysis (elasticity and revenue or margin based strategies no longer suffice)?

I believe that NLP is a much more powerful way of gauging what customers think and feel about pricing as opposed to consumer satisfaction scores (or NPS scores for that matter), which are inherently biased for purchasers vs. non-purchasers.

If you have the time, I encourage you to read the code, and make suggestions on how to better improve the methodology of evaluating pricing sentiment (which, admittedly, is quite basic below).

Our data

Obtained product reviews and ratings for appr. 100 items. I used Python’s Beautiful Soup package that works quite elegantly. The attributes are:
1. date: date of the review
2. rating: the rating the customer gave to the particular item on a particular day
3. comment: the actual text of the written customer comment
4. wmt_product_name: the name of the product displayed on the Walmart website
5. wmt_product_id: the Walmart product id associated with the product

Data and code can be found on https://github.com/KakasA09/TCRUGOct2015

Keywords

These are pricing keyword (see below). We will parse the Walmart customer comments to see if they include one of these keywords. It is certainly not a bullet-proof way of delineating pricing-related comments, and robust NLP methods are needed - pained to admit, but Python is best for that.

load('comments.Rda')
load('pricing_sentences.Rda')
keywords <- read.csv('pricing_keywords.csv')
keywords <- as.character(keywords$keyword)

keywords[1:5]

[1] “bargain” “bill” “bounty” “charge” “cheap”

Creating customized dictionaries for negative, positive, negation words, amplifiers and de-amplifiers

Since our approach for measuring customers’ pricing sentiment score is a slightly enhanced version of lexicon-based scoring, we will make it a bit more robust by adding industry-specific terms.

Sentiment scoring with parallelization

In the below, we will use the versatile qdap package to formulate sentiment polarity scores (constrained between +1 and -1). Qdap enables us to score our data by grouping factors: in our case by product and by date.

wmt_product_list = unique(pricing_sentences$wmt_product_id)

library(doMC)
registerDoMC(cores = detectCores()-2)

pricing_polarity_scores <- foreach(a = 1:length(wmt_product_list),.combine = 'rbind', 
                                   .packages = c('dplyr', 'qdap', 'tidyr')) %dopar% {
    
    pricing_sentences_small <- pricing_sentences %>% 
                    filter(wmt_product_id == wmt_product_list[a])
    
    pricing_polarity_small = with(pricing_sentences_small, 
              polarity(comment, polarity.frame = pos_negative_words, 
                       negators = negation_words, amplifiers = amplification_words,
                          deamplifiers = deamplification_words,
                          list(wmt_product_id, date), constrain = TRUE))
    
    colsplit2df(scores(pricing_polarity_small))
    
}

pricing_polarity_scores <- tbl_df(pricing_polarity_scores) %>% 
    select(wmt_product_id, date, ave.polarity)
#save(pricing_polarity_scores, file = 'pricing_polarity_scores.Rda')

Now, let’s do some analysis (see comments in the code below)

Ngrams for best pricing sentiment score products

Ngrams for worst pricing sentiment score products

Words for best pricing sentiment score products

Words for worst pricing sentiment score products

Correlations between product ratings and pricing sentiment

wmt_product_name wmt_product_id correlation
VIZIO E400i-B2 40“” 1080p 120Hz Full-Array LED Smart HDTV 33054851 0.4603
HP Consumables CN684WN#140 564XL Black Ink 20657638 0.4518
ARRIS/Motorola SBG6580 SURFboard DOCSIS 3.0 Cable Modem and WiFi-N Router 15567546 0.3955
NETGEAR WNDR3400-100NAS N600 Wireless Dual Band Router 15539745 0.3751
Samsung 32“” 1080p 60Hz LED Smart HDTV, UN32H5203AFXZA 36483178 0.3599
Fitbit Flex Wireless Activity Sleep Band 26469465 0.3152
HP 61 Black/Tri-color Original Ink Cartridges, 2 pack (CR259FN) 15084439 0.1927
Samsung Galaxy Tab S 10.5“” Tablet 16GB 37065363 0.1499
Google H2G2-42 HDMI Streaming Media Player - Wi-Fi - 1080p - Netflix, YouTube, HBO GO, Hulu Plus - Black 33142918 0.148
Roku Streaming Stick, 3500R with Get 60 days FREE* of Rdio Unlimited and Try 3 months FREE of Hulu Plus 35030305 0.1414
Samsung Galaxy Tab 4 7.0“” Tablet 8GB 35822395 0.1304
ZZ Motorola 575319-019-00 SURFboard Docsis 3.0 Cable Mod 20742485 0.07075
Samsung 40“” 1080p 60Hz LED HDTV, UN40H5003AFXZA 39405700 0.05769
Samsung 40“” 1080p 60Hz LED Smart HDTV, UN40H5203AFXZA 36483179 -0.06063

Example of relatively strong correlation between rating and sentiment

Example of a weak correlation between rating and sentiment