ContextBase_SentimentAnalysis

The Problem
Our Solution

Section 1 - Data Import

Section 2 - Text Mining

Section 3 - Plotting Top Ten Words

Section 4 - Wordcloud

Section 5 - Word Association Table

Section 6 - Sentiment Score

Section 7 - Conclusions

Section 8 - Appendix
Section 8a - Required Packages
Section 8b - Session Information
Section 8c - Sample of Records Processed for Sentiment Analysis

Section 9 - References

The Problem

The internet has introduced complexity to Customer demand, and to the measurement of the effectiveness and scope of Marketing. The acceptance of Products and Services has become independent of the Enterprise providing the Product or Service, and the indicators of Customer intent are getting more complex.

Also, with increased complexity, the volume of data gathered by businesses has increased enormously and now greatly exceeds the processing capabilities of humans. The vast majority of online Business data is unorganized, and exists in textual form such as emails, support tickets, chats, social media, surveys, articles, and documents. Manually sorting through online Business data, (in order to gain hidden insights), would be difficult, expensive, and impossibly time-consuming.

Our Solution

ContextBase provides Data Science Sentiment Analysis of Business data to allow Clients to make sense of this chaos, yielding actionable insights that are otherwise unattainable. Sentiment analysis is the new ability of interactive programming languages to analyze the public’s opinions, sentiments, attitudes, and emotions from Social Media reviews, forums, discussions, blogs, and micro-blogs. Sentiment Analysis relies on natural language processing, data mining, Web mining, and text mining.

Sentiment is a determinant of almost all human activities and behaviors. Beliefs and perceptions effect the choices individuals and organizations make. Sentiment Analysis facilitates changes in business marketing strategies, daily interactions, productivity, and long-term success.

The Benefits of Data Science Sentiment Analysis include:
1) Processing vast amounts of data efficiently and at low-cost.
2) Immediately identify and assess potential crises in real time.
3) Consistently apply the same criteria to all data. When evaluating sentiment of a piece of text, humans agree only 60-65% of the time.
4) Customer Segmentation for increased effectiveness, accuracy, and precision in the marketing of products and services.

In order to demonstrate ContextBase’s capabilities of Sentiment Analysis of online or archived Customer comments, this document performs a Sentiment Analysis of Customer response text data posted to https://www.yelp.com/. The analysis includes Natural Language Programming, Text Mining, Association Rules Mining, Sentiment Analysis, and Plotting.

Section 1 - Data Import

The data imported for this project is a collection of 10,000 Customer feedback comments, posted to https://www.yelp.com/.

The R programming language allows for convenient dataset access, efficient algorithmic manipulation of datasets, (for example, the ability to apply functions across datasets without FOR loops), and efficient statistical processing of dataset records, and observations.

# Import Data
import_data <- read.csv("yelp.csv")
project_data <- data.frame(import_data$text)
rm(import_data)
names(project_data) <- "Data"
project_data$Data <- as.character(project_data$Data)

Section 2 - Text Mining

The following functions normalize the text within the 1000 selected Customer responses by removing numbers, punctuation, and white space. Upper case letters are converted to lower case, and irrelevant stop words, (“the”, “a”, “an”, etc.) are removed. Lastly, a “Document Term Matrix” is created to classify the dataset terms into frequency of usage.

# Text Mining Functions
dtmCorpus <- function(df) {
  df_corpus <- Corpus(VectorSource(df))
  df_corpus <- tm_map(df_corpus, function(x) iconv(x, to='ASCII'))
  df_corpus <- tm_map(df_corpus, removeNumbers)
  df_corpus <- tm_map(df_corpus, removePunctuation)
  df_corpus <- tm_map(df_corpus, stripWhitespace)
  df_corpus <- tm_map(df_corpus, tolower)
  df_corpus <- tm_map(df_corpus, removeWords, stopwords('english'))
  DocumentTermMatrix(df_corpus)
}

termCorpus <- function(df) {
  df_corpus <- Corpus(VectorSource(df))
  df_corpus <- tm_map(df_corpus, function(x) iconv(x, to='ASCII'))
  df_corpus <- tm_map(df_corpus, removeNumbers)
  df_corpus <- tm_map(df_corpus, removePunctuation)
  df_corpus <- tm_map(df_corpus, stripWhitespace)
  df_corpus <- tm_map(df_corpus, tolower)
  df_corpus <- tm_map(df_corpus, removeWords, stopwords('english'))
  df_corpus
}

Section 3 - Plotting Top Ten Words

The R programming language provides for convenient graphical processing of data contained within internet datasets. This section of the analysis demonstrates the frequency of the top ten words in the normalized Customer responses in relationship to each other. Thereby, providing insight into the motivation of Customers.

# Top Ten Words Graph Function
toptenwordsGraph <- function(x, y) {
  tweets_matrix <- as.matrix(x)
  tweetsF <- colSums(tweets_matrix)
  tweets_topten <- data.frame(sort(tweetsF, decreasing = T)[1:10])
  tweets_toptenF <- data.frame(rownames(tweets_topten),
                               tweets_topten[,1])
  names(tweets_toptenF) <- c("Words", "Freq")
  
  ggplot(tweets_toptenF, aes(x=reorder(Words,-Freq), y=Freq)) +
    geom_bar(stat="identity", col="orange", fill=rainbow(10)) +
    theme(text = element_text(size=12), axis.text.x =
            element_text(angle=90, vjust=1)) +
    labs (x="Words", y="Frequency", title=paste("Figure 1: Top Ten Words -", y))
}

# Import train dataset
Train_dtm <- dtmCorpus(project_data$Data[1:1000])
TopTenWords <- toptenwordsGraph(Train_dtm, "Yelp Data")

# Top Ten Words Graph
TopTenWords

Section 4 - Wordcloud

This section creates a word cloud of the Customer responses. The popular terms Customers use for product or service feedback are differentiated by size, color, and alignment to provide further insight into the priorities of Customers.

## [1] "Figure 2: Wordcloud - Yelp Data"

Section 5 - Word Association Table

This section reveals associations between Customer feedback terms. This technique is derived from Market Basket Analysis, (also referred to as “Association Rules Mining”), and allows Vendors to determine the relationships of requirements by Customers.

# The assocTable Function
assocTable <- function(x, y) {
  wordAssoc1 <- data.frame(findAssocs(x, as.character(y$data$Words[1]), 0.2))
  wordAssoc2 <- data.frame(findAssocs(x, as.character(y$data$Words[2]), 0.2))
  wordAssoc3 <- data.frame(findAssocs(x, as.character(y$data$Words[3]), 0.2))
  wordAssoc4 <- data.frame(findAssocs(x, as.character(y$data$Words[4]), 0.2))
  wordAssoc5 <- data.frame(findAssocs(x, as.character(y$data$Words[5]), 0.2))
  WordAssocTable <- data.frame(matrix(nrow=5, ncol=10))
  names(WordAssocTable) <- c(as.character(y$data$Words[1]),"Frequency",as.character(y$data$Words[2]), "Frequency",as.character(y$data$Words[3]), "Frequency",as.character(y$data$Words[4]), "Frequency",as.character(y$data$Words[5]), "Frequency")
  WordAssocTable[,1] <- rownames(wordAssoc1)[1:5] 
  WordAssocTable[,2] <- wordAssoc1[1:5,1]
  WordAssocTable[,3] <- rownames(wordAssoc2)[1:5] 
  WordAssocTable[,4] <- wordAssoc2[1:5,1]
  WordAssocTable[,5] <- rownames(wordAssoc3)[1:5] 
  WordAssocTable[,6] <- wordAssoc3[1:5,1]
  WordAssocTable[,7] <- rownames(wordAssoc4)[1:5] 
  WordAssocTable[,8] <- wordAssoc4[1:5,1]
  WordAssocTable[,9] <- rownames(wordAssoc5)[1:5] 
  WordAssocTable[,10] <- wordAssoc5[1:5,1]
  WordAssocTable
}

# Creates assocTables
# Words associated with the top ten words
TrainAssocTable <- assocTable(Train_dtm, TopTenWords)

# Print words associated with the top ten words
kable(TrainAssocTable, caption = "Table 1: Word Association Table - Yelp Data")

Table 1: Word Association Table - Yelp Data
good	Frequency	place	Frequency	food	Frequency	like	Frequency	great	Frequency
pretty	0.28	like	0.29	restaurant	0.30	place	0.29	ish	0.21
place	0.27	good	0.27	good	0.25	just	0.28	apparent	0.21
food	0.25	things	0.25	service	0.24	feel	0.28	cardssort	0.21
just	0.25	people	0.21	mexican	0.24	affair	0.28	deemed	0.21
liked	0.23	aroundi	0.21	dish	0.21	beyondextraordinary	0.28	formed	0.21

Section 6 - Sentiment Score

The Sentiment Score of the Customer feedback text provides an objective analysis of positivity, negativity, and eight emotions associated with the terms used in response comments. The higher the score associated with the individual emotions, the more intense the phrases used by the Customer are.

Section 7 - Conclusions

This document demonstrates ContextBase’s abilities to combine Natural Language Processing, Text Mining, and Sentiment Analysis in order to assist Clients to gain much greater understanding of the feedback from their Customers, and Customers in general within relevant Market segments.

ContextBase applies scientific methods and processes in the R programming language to the analysis of proprietary Client datasets, or the datasets stored in modern voluminous data storage. ContextBase’s objective in analyzing business datasets with Data Science methods is to extract knowledge to increase the competitiveness of businesses, or to provide insights that can lead to increased efficiency in business processes.

The R programming language emerged with the advent of Data Science, and is uniquely capable of handling the processes required by Data Science. ContextBase has many years of experience in Data Science programming in R, and has recieved many accolades from Clients, and has accomplished many unique achievements in Data Science.

To get started applying your Businesses datasets to gain increased understanding of your Customers or Processes, contact ContextBase via sending an email to ContextBase’s CEO, John Akwei, at johnakwei1@gmail.com or sending a Direct Message to https://twitter.com/johnakwei.

Section 8 - Appendix

Section 8a - Required Packages

Several R language packages of pre-programmed functions are used for the analysis. The R programming language has a vast collection of dataset processing packages that encompass a wide variety of modern statistical and scientific methods. The R packages included in this document are packages for Natural Language Processing, plotting, and wordcloud creation.

	List of Required Packages
Required Packages	‘twitteR’ ‘tm’ ‘syuzhet’ ‘plyr’ ‘dplyr’ ‘data.table’ ‘ggplot2’ ‘stringr’ ‘igraph’ ‘wordcloud’ ‘knitr’

Section 8b - Session Information

Session information is provided for Reproducible Research. The Session Information below is for reference when running the required packages, and R code.

	Session Information
R Version	R version 3.6.0 (2019-04-26)
Platform	x86_64-w64-mingw32/x64 (64-bit)
Running	Windows 10 x64 (build 17763)
RStudio Citation	RStudio: Integrated Development Environment for R
RStudio Version	1.0.153

Section 8c - Sample of Records Processed for Classification

This is the first five comments found in the the Yelp dataset of Customer feedback comments. The above analysis processes the first 1000 comments within the Yelp dataset.

Sample of Yelp Comment Data
Record	Yelp Comment
1	My wife took me here on my birthday for breakfast and it was excellent. The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure. Our waitress was excellent and our food arrived quickly on the semi-busy Saturday morning. It looked like the place fills up pretty quickly so the earlier you get here the better. Do yourself a favor and get their Bloody Mary. It was phenomenal and simply the best I’ve ever had. I’m pretty sure they only use ingredients from their garden and blend them fresh when you order it. It was amazing. While EVERYTHING on the menu looks excellent, I had the white truffle scrambled eggs vegetable skillet and it was tasty and delicious. It came with 2 pieces of their griddled bread with was amazing and it absolutely made the meal complete. It was the best “toast” I’ve ever had. Anyway, I can’t wait to go back!
2	I have no idea why some people give bad reviews about this place. It goes to show you, you can please everyone. They are probably griping about something that their own fault…there are many people like that. In any case, my friend and I arrived at about 5:50 PM this past Sunday. It was pretty crowded, more than I thought for a Sunday evening and thought we would have to wait forever to get a seat but they said we’ll be seated when the girl comes back from seating someone else. We were seated at 5:52 and the waiter came and got our drink orders. Everyone was very pleasant from the host that seated us to the waiter to the server. The prices were very good as well. We placed our orders once we decided what we wanted at 6:02. We shared the baked spaghetti calzone and the small “Here’s The Beef” pizza so we can both try them. The calzone was huge and we got the smallest one (personal) and got the small 11" pizza. Both were awesome! My friend liked the pizza better and I liked the calzone better. The calzone does have a sweetish sauce but that’s how I like my sauce! We had to box part of the pizza to take it home and we were out the door by 6:42. So, everything was great and not like these bad reviewers. That goes to show you that you have to try these things yourself because all these bad reviewers have some serious issues.
3	love the gyro plate. Rice is so good and I also dig their candy selection :)
4	Rosie, Dakota, and I LOVE Chaparral Dog Park!!! It’s very convenient and surrounded by a lot of paths, a desert xeriscape, baseball fields, ballparks, and a lake with ducks. The Scottsdale Park and Rec Dept. does a wonderful job of keeping the park clean and shaded. You can find trash cans and poopy-pick up mitts located all over the park and paths. The fenced in area is huge to let the dogs run, play, and sniff!
5	General Manager Scott Petello is a good egg!!! Not to go into detail, but let me assure you if you have any issues (albeit rare) speak with Scott and treat the guy with some respect as you state your case and I’d be surprised if you don’t walk out totally satisfied as I just did. Like I always say….. “Mistakes are inevitable, it’s how we recover from them that is important”!!! Thanks to Scott and his awesome staff. You’ve got a customer for life!! ………. :^)

Section 9 - References

https://uc-r.github.io/sentiment_analysis
https://web.stanford.edu/class/cs124/lec/sentiment.pdf
https://en.wikipedia.org/wiki/Sentiment_analysis

ContextBase - Sentiment Analysis

https://contextbase.github.io

All programming by John Akwei, ECMp ERMp Data Scientist

June 23, 2019

Table of Contents

The Problem

Our Solution

Section 1 - Data Import

Section 2 - Text Mining

Section 3 - Plotting Top Ten Words

Section 4 - Wordcloud

Section 5 - Word Association Table

Section 6 - Sentiment Score

Section 7 - Conclusions

Section 8 - Appendix

Section 8a - Required Packages

Section 8b - Session Information

Section 8c - Sample of Records Processed for Classification

Section 9 - References