Summary

Impossible Foods Inc. is a food-company that develops plant-based substitutes for meat products. Their motive is to offer a healthier and environmentally friendly alternative, while attempting to retain the taste, texture, and nutritional benefits of meat. You can check their website here https://impossiblefoods.com/burger/

Given the exciting market for meat alternatives, I would like to explore how people think of the Impossible Burger. I will extract 1200 tweets from twitter that are talking about the Impossible Burger and develop a network and sentiment analysis of the tweets. I will also Show my analysis with visuals and networks and describe my networks quality with centrality metrics.

In conclussion, Impossible Meats represents a frontrunner in a revolutionary industry, but there are still many who doubt or even fear and hate the idea of the impossible burger. Impossible Foods should consider focusing on their brand image and the nature of their meat in marketing campaigns to decrease the skepticism towards their food and increase positive sentiment. If these meatless meat companies can manage to debunk the negative myths surrounding their food products, they soon may become a regular staple in all restaurants, as well as consumer homes.

More interesting projects that I’ve done could be found here https://github.com/XuebinZhu/Projects

                            ####################################################
                            ### Sentiment Analysis for the Impossible Burger ###
                            ####################################################
rm(list=ls())

#load all the required library
library("twitteR")
library("ROAuth")
library(tm)
library(NLP)
library(wordcloud)
library(RColorBrewer)
library(syuzhet)
library(lubridate)
library(ggplot2)
library(scales)
library(dplyr)
library(igraph)
library(rmarkdown)

Tweets Scraping

I will specify my api_key, api_secret, access_token, access_token_secret to use the Twitter API for tweets scraping. You can also setup your own.

# Obtaining Twitter Data 

api_key = "your own api key"
api_secret = "your own api secret"
access_token = "your own access token"
access_token_secret = "your own access token secret"

# setup_twitter_oauth(api_key, api_secret, access_token, access_token_secret)

# Extracting Tweets 

#terms = c("impossible burger", "impossibleburger")
#tweets = searchTwitter(terms, n=1200, lang = "en")
#length(tweets)

# Converting to Dataframe 
#impossibleburger = twListToDF(tweets)

# Creating csv file 
#write.csv(impossibleburger, file = 'YOUR FILE PATH', row.names = F)

Data Cleaning and Preparation

Note that the tweets will change everytime you scrap them, so you should scrap once and save the tweets into a dataset for the following analysis. Now, with the tweets I obtained above, I will conduct data cleaning and data preparation.

###################################
###Data Cleaning and Preparation###
###################################

#1. Reading Data File
impossible_burger <- read.csv(file = "impossibleburger.csv", header=T) #choose the impossible_burger.csv file
str(impossible_burger) #look at structure of the  file (has 1200 obs and 16 var)

## 'data.frame':    1200 obs. of  16 variables:
##  $ text         : Factor w/ 924 levels "'It's a hard story to tell': Jonathan Safran Foer on how Greta Thunberg, the Impossible Burger, and meatless br"| __truncated__,..: 460 174 108 416 668 554 862 668 555 595 ...
##  $ favorited    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ favoriteCount: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ replyToSN    : Factor w/ 207 levels "_walkonU","2PennysOnePair",..: NA 151 87 NA NA NA NA NA NA NA ...
##  $ created      : Factor w/ 1190 levels "2019-10-08 14:41:55",..: 1190 1189 1188 1187 1186 1185 1184 1183 1182 1181 ...
##  $ truncated    : logi  FALSE TRUE TRUE FALSE FALSE TRUE ...
##  $ replyToSID   : num  NA 1.18e+18 1.18e+18 NA NA ...
##  $ id           : num  1.18e+18 1.18e+18 1.18e+18 1.18e+18 1.18e+18 ...
##  $ replyToUID   : num  NA 2.78e+08 3.01e+07 NA NA ...
##  $ statusSource : Factor w/ 53 levels "<a href=\"http://app.sendblur.com\" rel=\"nofollow\">Social Media Publisher App </a>",..: 13 8 13 13 13 10 12 13 10 13 ...
##  $ screenName   : Factor w/ 1118 levels "__Mamass__","_1FamST",..: 13 33 426 579 566 975 723 948 975 711 ...
##  $ retweetCount : int  0 0 0 0 29 0 0 29 0 0 ...
##  $ isRetweet    : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
##  $ retweeted    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ longitude    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ latitude     : num  NA NA NA NA NA NA NA NA NA NA ...

#clean the text of special characters such as symbols and emoticons
impossible_burger$text <- sapply(impossible_burger$text,function(row) iconv(row, "latin1", "ASCII", sub=""))

#2. Building Corpus
corpus <-iconv(impossible_burger$text, to='utf-8') #need only the first col text from file
corpus <- Corpus(VectorSource(corpus)) #corpus is a collection of texts
inspect(corpus[1:5]) #inspect the first five tweets

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] ive been trying to eat clean during exam week, but im curious to try that impossible burger from burger king                                
## [2] @pxlplz I havent tried the impossible whopper yet, but the regular impossible burger was fine. Similar to beef at https://t.co/uRD9dp7Ro8   
## [3] @JoannosaurusRex also didnt know how to make this into tweet but owner sat down with me and picked my brain over B https://t.co/TL6U2DR1WB  
## [4] i dont eat meat but youll never catch me getting that impossible burger from burger king. hell no                                           
## [5] RT @fleroy1974: "[classifying food based on its degree of processing] makes no sense from a nutritional perspective - Yes it does.\n\n"Yogur

#3. Cleaning Data
#convert data to lower case for analysis
corpus <-tm_map(corpus, tolower) #convert all alphabet to lower case

## Warning in tm_map.SimpleCorpus(corpus, tolower): transformation drops documents

inspect(corpus[1:5]) #inspect the first five tweets

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] ive been trying to eat clean during exam week, but im curious to try that impossible burger from burger king                                
## [2] @pxlplz i havent tried the impossible whopper yet, but the regular impossible burger was fine. similar to beef at https://t.co/urd9dp7ro8   
## [3] @joannosaurusrex also didnt know how to make this into tweet but owner sat down with me and picked my brain over b https://t.co/tl6u2dr1wb  
## [4] i dont eat meat but youll never catch me getting that impossible burger from burger king. hell no                                           
## [5] rt @fleroy1974: "[classifying food based on its degree of processing] makes no sense from a nutritional perspective - yes it does.\n\n"yogur

#remove punctuations
corpus <-tm_map(corpus, removePunctuation)

## Warning in tm_map.SimpleCorpus(corpus, removePunctuation): transformation drops
## documents

inspect(corpus[990:996]) #inspect the first five tweets

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 7
## 
## [1] sometimes the best date u can do is sitting on the parking garage roof listening to goblin alone with an impossible httpstco6l5eg7lino  
## [2] rt bpmehlman the impossible whopper has roughly the same number of calories as a traditional whopper they both have similar amounts of f
## [3] that impossible burger really wham                                                                                                      
## [4] rt bfsooner tom herman drives past a whataburger to get an impossible whopper at burger king                                            
## [5] rt mercyforanimals in both revenue and pounds gelsonsmarkets has sold more impossiblefoods burger than traditional ground beef \nht     
## [6] rt gothfundme miller boss all the new recruits are dipshits they wont stop asking the mess hall for the impossible burger from burg     
## [7] miller boss all the new recruits are dipshits they wont stop asking the mess hall for the impossible burger f httpstcoowibwr7qjx

#remove numbers
corpus <-tm_map(corpus, removeNumbers)

## Warning in tm_map.SimpleCorpus(corpus, removeNumbers): transformation drops
## documents

inspect(corpus[1:5]) #inspect the first five tweets

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] ive been trying to eat clean during exam week but im curious to try that impossible burger from burger king                      
## [2] pxlplz i havent tried the impossible whopper yet but the regular impossible burger was fine similar to beef at httpstcourddpro   
## [3] joannosaurusrex also didnt know how to make this into tweet but owner sat down with me and picked my brain over b httpstcotludrwb
## [4] i dont eat meat but youll never catch me getting that impossible burger from burger king hell no                                 
## [5] rt fleroy classifying food based on its degree of processing makes no sense from a nutritional perspective  yes it does\n\nyogur

#remove common words-they dont add any informational value
#use the stopwords function in english
#select stopwords(english) to see what words are removed
cleanset <-tm_map(corpus, removeWords, stopwords('english'))

## Warning in tm_map.SimpleCorpus(corpus, removeWords, stopwords("english")):
## transformation drops documents

inspect(cleanset[1:5])

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] ive  trying  eat clean  exam week  im curious  try  impossible burger  burger king                           
## [2] pxlplz  havent tried  impossible whopper yet   regular impossible burger  fine similar  beef  httpstcourddpro
## [3] joannosaurusrex also didnt know   make   tweet  owner sat     picked  brain  b httpstcotludrwb               
## [4]  dont eat meat  youll never catch  getting  impossible burger  burger king hell                              
## [5] rt fleroy classifying food based   degree  processing makes  sense   nutritional perspective  yes  \n\nyogur

#remove URLs (https://etc.)
#make use of function http
removeURL <- function(x) gsub("http[[:alnum:]]*", '', x)
cleanset <-tm_map(cleanset, content_transformer(removeURL))

## Warning in tm_map.SimpleCorpus(cleanset, content_transformer(removeURL)):
## transformation drops documents

inspect(cleanset[1:5])

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] ive  trying  eat clean  exam week  im curious  try  impossible burger  burger king                          
## [2] pxlplz  havent tried  impossible whopper yet   regular impossible burger  fine similar  beef                
## [3] joannosaurusrex also didnt know   make   tweet  owner sat     picked  brain  b                              
## [4]  dont eat meat  youll never catch  getting  impossible burger  burger king hell                             
## [5] rt fleroy classifying food based   degree  processing makes  sense   nutritional perspective  yes  \n\nyogur

#tweets were pulled using impossibleburger or impossible burger so we can clean it from the text as well as the other useless words
cleanset <-tm_map(cleanset, removeWords, c('impossible burger', 'impossibleburger', 'impossible_burgers','impossibleburgers', 'burger', 'impossible', 'burgers', 'food', 'can', 'via', 'other'))

## Warning in tm_map.SimpleCorpus(cleanset, removeWords, c("impossible burger", :
## transformation drops documents

inspect(cleanset[1:5])

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] ive  trying  eat clean  exam week  im curious  try     king                                             
## [2] pxlplz  havent tried   whopper yet   regular   fine similar  beef                                       
## [3] joannosaurusrex also didnt know   make   tweet  owner sat     picked  brain  b                          
## [4]  dont eat meat  youll never catch  getting     king hell                                                
## [5] rt fleroy classifying  based   degree  processing makes  sense   nutritional perspective  yes  \n\nyogur

#remove white spaces
cleanset <- tm_map(cleanset, stripWhitespace)

## Warning in tm_map.SimpleCorpus(cleanset, stripWhitespace): transformation drops
## documents

inspect(cleanset[1:5])

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] ive trying eat clean exam week im curious try king                                         
## [2] pxlplz havent tried whopper yet regular fine similar beef                                  
## [3] joannosaurusrex also didnt know make tweet owner sat picked brain b                        
## [4]  dont eat meat youll never catch getting king hell                                         
## [5] rt fleroy classifying based degree processing makes sense nutritional perspective yes yogur

Term Document Matrix

Before we can conduct sentiment analysis, we need to provide some structure to tweets by creating a matrix of rows/coloums. This is called term document matrix (tdm).

##########################
###Term Document Matrix###
##########################

tdm <- TermDocumentMatrix(cleanset)
#if you would like to look at this matrix, you have to convert this into matrix first
tdm <- as.matrix(tdm)
tdm[1:10, 1:20] #look at first 10 rows/terms and 20 tweets

##          Docs
## Terms     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
##   clean   1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   curious 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   eat     1 0 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   exam    1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   ive     1 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0
##   king    1 0 0 1 0 0 0 0 0  1  0  0  0  1  0  0  0  0  1  0
##   try     1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   trying  1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  0  0  0
##   week    1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   beef    0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1

Visulize Text Data

In the tdm if you sum rows, it will tell you how many times a term appears. Also there are many words/terms so we create a subset of w where row sum is >30.

# Bar Plot
w <- rowSums(tdm)
w <- subset(w, w>=30) #we can run "w" to see which words appear how many times
barplot(w, las = 2, col=rainbow(40)) #words represented vertically using las=2, rainbow colors 
#in the plot, you may find other words that you don't need for analysis, so you can go back 
#and combine them into a clean data dictionary
#clean the dataset of these words using dictionary created and then redo term document matrix

cleanset <-tm_map(cleanset, removeWords, c('add words', 'add words'))

## Warning in tm_map.SimpleCorpus(cleanset, removeWords, c("add words", "add
## words")): transformation drops documents

inspect(cleanset[1:5])

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 5
## 
## [1] ive trying eat clean exam week im curious try king                                         
## [2] pxlplz havent tried whopper yet regular fine similar beef                                  
## [3] joannosaurusrex also didnt know make tweet owner sat picked brain b                        
## [4]  dont eat meat youll never catch getting king hell                                         
## [5] rt fleroy classifying based degree processing makes sense nutritional perspective yes yogur

Word Cloud

w <- sort(rowSums(tdm), decreasing=TRUE) #sort words in decreasing order
set.seed(9999)
wordcloud(words = names(w), 
          freq=w, max.words = 300, 
          random.order =FALSE)  #words are specified in names in w dataframe, frequency is stored in w, random order=false

#specifying options in word cloud
#Specify that max words be no more than say, 200
#Freq for terms to be included in wordcloud (say they have to appear 5 times to be included)
#color words, specify scale (bigger words max size =3, smaller =0.2)
#rotate some words (rotation percentage = 30%)

wordcloud(words = names(w), 
          freq=w, 
          random.order =FALSE,
          max.words = 200, 
          min.freq = 5,
          colors = brewer.pal(8, 'Dark2'), 
          scale = c(3, 0.2), 
          rot.per = .3)

Summary for Word Cloud

The word most frequently mentioned is “meat”, which is ironic considering that this product is meant to be a replacement for traditional meat. In addition, the word backlash is mentioned quite a bit, which is tied to the negative opinions and many articles that have recently arisen to discuss the consequences and hostility toward Impossible Food. In fact, the word “backlash” also relates to “voxdotcom”, which is the twitter account of the popular online media company Vox. Their tweet and accompanying article (the word “mainstream” is also a part of the article title) have been retweeted by countless people and represents a node with a high degree in our relationship model show below.

“King” and “whopper” are connected to Burger King’s recent release of the “Impossible Whopper”, which has garnered a lot of publicity. Beyond Meats even gets a shout out, as “beyond” is another popular mention in tweets, most likely by people who are comparing the different meat substitute brands.

Sentiment Analysis

#Reading Files
#take the first column, text and put it into tweets dataframe
tweets <- iconv(impossible_burger$text, to="utf-8")
#obtain sentiment scores for each 1200 tweets
#nrc_sentiment dictionary is called to calculate presence of 
#eight emotions & their corresponding valence in their text file
s <-get_nrc_sentiment(tweets)
head(s)

##   anger anticipation disgust fear joy sadness surprise trust negative positive
## 1     0            0       0    0   1       1        0     1        1        3
## 2     0            0       0    0   0       1        0     0        1        0
## 3     0            0       0    0   0       0        0     0        0        0
## 4     1            0       1    1   0       2        1     0        2        2
## 5     0            0       0    0   1       0        0     1        0        3
## 6     0            0       0    0   0       0        0     0        0        0

#you could also look at phrases or words in these tweets to see if they lead to positive or negative') 
#for example, check the sentiment of impossible burger
get_nrc_sentiment('impossible burger')

##   anger anticipation disgust fear joy sadness surprise trust negative positive
## 1     0            0       0    0   0       1        0     0        1        0

#note that sadness is 1 and negative is 1

Sentiment Scores

#plot the sentiment scores
#lets sum the column scores across tweets for the plot
#label y axis as total count, main title of plot label

barplot(colSums(s), 
        las = 2,
        ylab = 'Total Count', 
        main ='Sentiment Scores for impossible_burger Tweets')

Sentiment Summary

As one can see, there is a significant count of tweets that negative, positive, or sad sentiments. The fact that negative sentiments make up the majority is interesting because it shows that despite Impossible Foods’ image of creating a healthy, more sustainable alternative for meat that preserves the look and taste of the original, people still are suspicious of the product and it has faced harsh criticism in the twitter population.

Social Network Analysis

#lets look at our term document matrix, 10 rows, 10 cols
tdm[1:20, 1:20]

##          Docs
## Terms     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
##   clean   1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   curious 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   eat     1 0 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   exam    1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   ive     1 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0
##   king    1 0 0 1 0 0 0 0 0  1  0  0  0  1  0  0  0  0  1  0
##   try     1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   trying  1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  0  0  0
##   week    1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   beef    0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1
##   fine    0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   havent  0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   pxlplz  0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   regular 0 1 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0
##   similar 0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   tried   0 1 0 0 0 0 0 0 0  1  0  0  0  1  0  0  0  0  0  0
##   whopper 0 1 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  1  0
##   yet     0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   also    0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   brain   0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0

tdm[tdm>1] <-1 
#whenever our tdm value is more than 1 for a tweet we convert into 1 because we dont need the values 2, 3,
#we only need that the term appeared (freq of terms is not required in network analysis)
tdm[1:20, 1:20]

##          Docs
## Terms     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
##   clean   1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   curious 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   eat     1 0 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   exam    1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   ive     1 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0
##   king    1 0 0 1 0 0 0 0 0  1  0  0  0  1  0  0  0  0  1  0
##   try     1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   trying  1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  0  0  0
##   week    1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   beef    0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1
##   fine    0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   havent  0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   pxlplz  0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   regular 0 1 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0
##   similar 0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   tried   0 1 0 0 0 0 0 0 0  1  0  0  0  1  0  0  0  0  0  0
##   whopper 0 1 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  1  0
##   yet     0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   also    0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   brain   0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0

Create Tweet Adjacency Matrix

termM <- tdm%*%t(tdm) #transpose of tdm matrix; create tweet adjacency matrix using %%
termM[1:10, 1:10] #check the term term matrix

##          Terms
## Terms     clean curious eat exam ive king try trying week beef
##   clean       1       1   1    1   1    1   1      1    1    0
##   curious     1       2   1    1   1    1   1      1    1    0
##   eat         1       1  35    1   1    6   1      1    2    1
##   exam        1       1   1    1   1    1   1      1    1    0
##   ive         1       1   1    1  20    5   3      2    3    0
##   king        1       1   6    1   5  141  10      3    3    4
##   try         1       1   1    1   3   10  38      1    1    0
##   trying      1       1   1    1   2    3   1     12    1    0
##   week        1       1   2    1   3    3   1      1    6    0
##   beef        0       0   1    0   0    4   0      0    0   61

g <- graph.adjacency(termM, weighted=T, mode ='undirected') #convert it into graph, no direction for edges
g

## IGRAPH f8310aa UNW- 2494 22085 -- 
## + attr: name (v/c), weight (e/n)
## + edges from f8310aa (vertex names):
##  [1] clean  --clean   clean  --curious clean  --eat     clean  --exam   
##  [5] clean  --ive     clean  --king    clean  --try     clean  --trying 
##  [9] clean  --week    curious--curious curious--eat     curious--exam   
## [13] curious--ive     curious--king    curious--try     curious--trying 
## [17] curious--week    curious--beyond  curious--much    curious--just   
## [21] curious--bur     curious--campus  curious--new     curious--heard  
## [25] curious--truck   eat    --eat     eat    --exam    eat    --ive    
## [29] eat    --king    eat    --try     eat    --trying  eat    --week   
## + ... omitted several edges

#remove terms that have loops (going to self) 
g <- simplify(g)

#set labels and degrees of Vertices (V), each word is a vertices
V(g)$label <- V(g)$name #label is name
V(g)$degree <- degree(g) #degree is the number of connections between terms
head(V(g)$label)

## [1] "clean"   "curious" "eat"     "exam"    "ive"     "king"

head(V(g)$degree)

## [1]   8  16 202   8 119 378

#Histogram of node degree, lets just use 100 bars (too many words), label of y and x axis
hist(V(g)$degree, 
     breaks=100, 
     col='Light Blue', 
     main ='Histogram of Node Degree for Words',
     ylab ='Frequency',
     xlab='Degree of Vertices') #right skewed

#Network diagram
plot(g)

#interpretation is difficult so recreate more meaningful visuals

#Recreate this by looking at just the top terms/nodes by degree
tdm <- tdm[rowSums(tdm)>30,] #lets reduce the size and counts of total frequency (rowSum) 
#include only terms having frequency more than 30
#it will take out all very infrequent terms
#Rerun all other code
tdm[tdm>1] <-1 
termM <-tdm %*% t(tdm)
g <- graph.adjacency(termM, weighted=T, mode ='undirected')
g <- simplify(g)
V(g)$label <- V(g)$name 
V(g)$degree <- degree(g)

plot(g, 
     vertex.color='green', 
     vertex.size = 8, 
     vertex.label.dist =1.5)

#much more cleaner than earlier. You can further increase size of vertex by changing options
#there are some dense connections in the nodes (to near nodes)

#Community creation (edge betweenness)
comm <- cluster_edge_betweenness(g)

## Warning in cluster_edge_betweenness(g): At community.c:460 :Membership vector
## will be selected based on the lowest modularity score.

## Warning in cluster_edge_betweenness(g): At community.c:467 :Modularity
## calculation with weighted edge betweenness community detection might not make
## sense -- modularity treats edge weights as similarities while edge betwenness
## treats them as distances

plot(comm, g, main = 'Words Community \n created by edge betweenness', sub = '- only include terms having frequency more than 30')

#you can also do this by using propagating labels
prop <-cluster_label_prop(g)
plot(prop, g, main = 'Words Community \n created by greedy algorithm', sub = '- only include terms having frequency more than 30')

#groupings for community detection are different - algorithms are different

greed <-cluster_fast_greedy(as.undirected(g)) #greedy algorithm for clustering
plot(greed, as.undirected(g), main = 'Words Community \n created by propagating labels ', sub = '- only include terms having frequency more than 30')

Clustering Summary

I applied three clustering technique to categorize the word data into potential community among words. While the edge betweenness approach produced a slightly different result as compared with the label propagation and greedy algorithm approach, the overall themes of each word cluster remain similar among the three-community creation method.

The network clustering analysis shows that people have controversial views toward the idea of impossible burger. Such result supports our insights obtained from the sentiment analysis where we observed a high level of both positive and negative sentiment.

#highlighting degrees for a different kind of plot
V(g)$label.cex <- 2.2*V(g)$degree / max(V(g)$degree) + 0.3
V(g)$label.color <- rgb(0, 0, .2, .8)
V(g)$frame.color <- NA
egam <- (log(E(g)$weight) + 0.4) / max(log(E(g)$weight) + .4)
E(g)$color <- rgb(0.5, 0.5, 0, egam)
E(g)$width <- egam
plot(g, 
     vertex.color ='green', 
     vertex.size = V(g)$degree*0.5) #vertex size vary by degree

Visulization and Interpretation

#Network of tweets
tdm <- tdm[rowSums(tdm)>30,]
tweetM <- t(tdm)%*%tdm 
g <- graph.adjacency(tweetM, weighted =T, mode = 'undirected') #store graph adjacency in g
V(g)$degree <- degree(g)
g<- simplify(g) #remove loops

#Use 100 tweets to make histogram of degree
hist(V(g)$degree,
     breaks = 100, 
     col='Light Blue',
     main='Histogram of Degree for Tweet Network',
     ylabl='frequencies', 
     xlab='Degree')

## Warning in plot.window(xlim, ylim, "", ...): "ylabl" is not a graphical
## parameter

## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...): "ylabl"
## is not a graphical parameter

## Warning in axis(1, ...): "ylabl" is not a graphical parameter

## Warning in axis(2, ...): "ylabl" is not a graphical parameter

Histogram Summary

Overall, the distribution of the degree for tweet network is highly right-skewed. Degree is a metrics used to measure how many edges connected with a node. In our case, degree indicates the number of re-tweets for a single tweet. According to the plot we can see that the majority of these 1200 tweets are not connected with another tweet, indicating that most of the tweets didn’t get re-tweeted.

Besides, we noticed that there is a gap between 0 degree and 20 degrees. It means that if a tweet got re-tweeted, then it’s very likely to get re-tweeted more than 20 times. Moreover, it’s interesting to find that for those tweets that got re-tweeted, the general distribution follows a uniform distribution with lower bound of 20 degrees and upper bound of 500 degrees.

Therefore, if we want to increase the influence of our tweets, the first and also the most important step is to get re-tweeted by other people.

#Set labels of vertices to tweet IDs
V(g)$label <- V(g)$name 
V(g)$label.cex <-1  
V(g)$label.color <- rgb(0.4, 0, 0, 0.7) 
V(g)$size <- 2 #size of g
V(g)$frame.color <- NA #no frame color or lines of frame
plot(g, vertex.label =NA, vertex.size=5) #indicate size of vertex, for now, dont put labels (too much crowding)

#delete some vertices
egam <- (log(E(g)$weight) + 0.2)/ max(log(E(g)$weight) + 0.2) 
E(g)$color <- rgb(0.5, 0.5, 0, egam)
E(g)$width <- egam
g2 <- delete.vertices(g, V(g)[degree(g)<400]) #degree of g less than 400; get rid of no.of connections less than 400
#if you lose too many nodes, reduce the number 
plot(g2, 
     vertex.label.cex =0.90, 
     vertex.label.color ='black')

# look at clustering of tweets (1200 tweets), look at increasing/decreasing the tweet vertices #

#Delete edges - delete some edges to make the network better 
#(delete edges less than 2) and (delete vertices less than 400)
E(g)$color <- rgb(0.5, 0.5, 0, egam)
E(g)$width <- egam
g3<- delete.edges(g, E(g)$weight<- 2) 
g3 <- delete.vertices(g3, V(g3)[degree(g3)<400])
plot(g3, main = "Network of Tweets", sub = "- only include nodes that have edges > 2 and vertices > 400")

Network of Tweets Summary

In the graphs above, we only included nodes that have more than 2 edges and have more than 680 vertices. We noticed that tweets #1190, #1191, and #1194 are highly corelated. Thus, we retrieved these 3 tweets and found that all of them linked to the same article called Plant-based ‘meat’ is conquering fast food. Here’s where you can get meat substitutes like the Beyond Burger and the Impossible Taco. The posters of these tweets are all have 17.6K, 899, and 19.7k followers respectively. They are opinion leaders on the Internet and all of them shared the same article with the same comment at the same time. Therefore, it’s not difficult to see that the author or the publisher of this article may hire these opinion leaders to share this article, so that people are more likely to notice, read, and share this article. The fact is that by doing so, these tweets are indeed highly connected with the other tweets and got re-tweeted multiple times. So, in order to increase the online influence of an article, cooperating with opinion leaders may be a good strategy.

#take some samples of two/three major groups and find what the differences among the tweets are
impossible_burger$text[c(1190, 1191, 1194)]#check difference between two different tweets

## [1] "Plant-based 'meat' is conquering fast food. Here's where you can get meat substitutes like the Beyond Burger and th https://t.co/f5SwQ33G4k"
## [2] "Plant-based 'meat' is conquering fast food. Here's where you can get meat substitutes like the Beyond Burger and th https://t.co/KlBUA1Rygw"
## [3] "Plant-based 'meat' is conquering fast food. Here's where you can get meat substitutes like the Beyond Burger and th https://t.co/7Ev9RHIkK5"

Sentiment Analysis for the Impossible Burger

Danny Zhu

3/15/2020