Data 607 Final Project - Part-2 : Tweets Cleaning and Analysis

Tweets Clean-up and analysis

This is the second part of the project, where we will get the data from the data lake and clean it, store into MySQL, and perform the data analysis on the tweets text.

Pre-requisites:

To run this code, MySQL must be installed in the local computer.
The MySQL script given at this GitHub location must be run, so that the required schema and the table has been created before running this RMD File: https://raw.githubusercontent.com/charlsjoseph/CUNY-Data607/master/Data607-Final_Project/Data%20607%20Final%20Project%20-%20Twitter%20MySQL%20Queries.sql
All teh above libraries (R packages) must be installed on the local.

Read the tweets data from the CSV file which has been created from part-1 from the data lake which is MongoDB database for this project. CSV link is: https://raw.githubusercontent.com/charlsjoseph/CUNY-Data607/master/Data607-Final_Project/data/ExtractedData_iphone_xs.csv

Read CSV (Tweets data from staging database)

iphone_xs_csv_link <- "https://raw.githubusercontent.com/charlsjoseph/CUNY-Data607/master/Data607-Final_Project/data/ExtractedData_iphone_xs.csv"

iphone_xs_tweets_required <- read.csv(iphone_xs_csv_link, header = TRUE)

dim(iphone_xs_tweets_required)

## [1] 16401     8

iphone_xs_tweets_required %>% head(5) %>% kable() %>% kable_styling()

X_id	user_id	created_at	screen_name	text	source	location	followers_count
x5c0aa99c7fab604870013be0	x2890259085	2018-12-05 14:03:49	EmmiSilvas_	i entered @grav3yardgirl & @casetify iPhone XS MAX GIVEAWAY! CHECK IT OUT HERE! https://t.co/sMHtlMvWf1 #grav3yardgirliPhonegiveaway	Twitter for iPhone		85
x5c0aa99c7fab604870013be1	x2890259085	2018-12-05 14:03:42	EmmiSilvas_	i entered @grav3yardgirl & @casetify iPhone XS MAX GIVEAWAY! CHECK IT OUT HERE! https://t.co/sMHtlMvWf1 #grav3yardgirliPhonegiveaway	Twitter for iPhone		85
x5c0aa99c7fab604870013be2	x2647272271	2018-12-05 14:03:37	JumiaGhana	5 HOURS TO GO! Get ready, the iPhone Xs 65GB will be on TREASURE HUNT @ 12 MIDNIGHT FOR MIND-BLOWING GHÂ¢ 78 ONLY! Donât sleep! You canât afford to pass this up! NOTE: <U+0001F6D1>ONLY ONE ITEM AVAILABLE TO BE WON Go Here>> https://t.co/RD085Zynb8 https://t.co/pS7VXsdLzX	Twitter Web Client	Ghana	4115
x5c0aa99c7fab604870013be3	x22846501	2018-12-05 14:03:02	GhostBishopTV	@ufallb4me iPhone XS Max	Twitter for iPhone		273
x5c0aa99c7fab604870013be4	x16641674	2018-12-05 14:02:02	edelagrave	“After an investigation the ASA found that the statement was fair” https://t.co/aYvITtSD5g	Twitter Web Client	MontrÃ©al	776

Counts per user

Clean-up of the tweets to keep only the tweets data which needs to be reviewed for this analysis. Here we are grouping the tweet messages based on the user ID (screen name) and this will be used further to remove the advertizements.

##Now, grouping the tweets base don the screen_name
iphone_xs_tweets_required_group_user <- iphone_xs_tweets_required %>% group_by(screen_name)
per_user <- summarise(iphone_xs_tweets_required_group_user, number_of_messages=n())

# Arranging the tweets count in the descending order, so that the user (screen_name) with the highest count is on the top.
per_user <- arrange(per_user, desc(number_of_messages))

head(per_user)

## # A tibble: 6 x 2
##   screen_name     number_of_messages
##   <fct>                        <int>
## 1 whitestonedome                 180
## 2 marciojmsilva                   81
## 3 TopCellularDeal                 73
## 4 techzonestation                 56
## 5 allsorce_                       51
## 6 Galaxyukebay                    46

Attempt to remove ads

Now as we see above, there are close to 300 users on the top which have a bigger numer of tweets on the same topic - iphone xs. So, to lessen the occurences or impact of the advertizements in our analysis, we are removing the users from our analysis who have 5 or more tweets on this topic.

per_user <- per_user %>% filter(number_of_messages < 5)

## Joining the 2 data.frames, and getting the messages only for the users with screen names having 4 or fewer messages count
iphone_xs_tweets_required_users <- merge(iphone_xs_tweets_required, per_user, by="screen_name")

dim(iphone_xs_tweets_required_users)

## [1] 13515    10

##using only the text from the data.frame. The remaining fields will not be required, for further steps. Hence droping them.
iphone_xs_tweets_required_text <- iphone_xs_tweets_required_users %>% select(5)

iphone_xs_tweets_required_text %>% head(4) %>% kable() %>% kable_styling()

text
@iammrdom @ChucklefishLTD Same, on iPhone XS Max
@Cult0fPing Im having the same issue here. Theres $250 left on my iPhone 7 contract. Im getting the XS for Christmas and have to trade mine in to be able to afford it. Not sure of the exact value theyre giving me but its around the same, and Im still having to pay $500 for the XS.
I liked a @YouTube video https://t.co/JmbjLyAUV7 What’s On My iPhone XS Max?!
I liked a @YouTube video https://t.co/XBCrvfM579 WHAT’S ON MY IPHONE XS MAX + GIVEAWAY!!! \| DIMMA UMEH

Storing the text of all the required tweets in a MySQL database. This will help easy fetch from the database whenever we need it for our analysis. Initial few code lines are to set up the MySQL schema and table structure so the tweet text can be written to MySQL table. ### Connect to MySQL

## Connecting to MySQL
mydb = dbConnect(MySQL(), user='root', password='Deepak1234#', dbname='twitter', host='localhost')
summary(mydb)

## <MySQLConnection:0,0>
##   User:   root 
##   Host:   localhost 
##   Dbname: twitter 
##   Connection type: localhost via TCP/IP 
## 
## Results:

## Drop the twitter schema if it already exists on the MySQL database
#drop.schema.query <- paste("drop schema if exists `twitter`;")
#dbGetQuery(mydb, drop.schema.query)

## Create the twitter schema
#create.schema.query <- paste("CREATE SCHEMA `twitter`;")
#dbGetQuery(mydb, create.schema.query)

## Creating the new table 'twitter' for storig the tweets for iphone_xs
#drop.MySQL.table.query <- paste("DROP TABLE IF EXISTS `twitter`;")
#dbGetQuery(mydb, drop.MySQL.table.query)

## Note here, that even if the twitter tweet has a character limit of 140 characters, but there are some tweets which have refernces to multiple users making the actual character count bigger. The user names refered using @ are not counted in the tweet character length count. Hence, to be on a safer side, we have created the tweet_text of length 1000.
## If we created a shorter length of 500, we got an error for a few tweets which actually became more than 500 characters due to references to multiple users.
iphone_xs_tweets_required_text[10807,1]

## [1] "My people! The countdown is on @jumianigeria 1 day massive sale is in 5 days from today. Get an iPhone Xs max by participating in the treasure hunt and win more prizes https://t.co/BHCRupRef3 #5DaystoJumia1212"

nchar(iphone_xs_tweets_required_text[10807,1])

## [1] 209

#create.MySQL.table.query <- paste("CREATE TABLE `twitter` (
#  `tweet_text` varchar(1000) NOT NULL
#);")

Storing data in Relational database

## There were some tweets with double quote, replacing it with a space to avoid any error while interting into MySQL
iphone_xs_tweets_required_text$text <- str_replace_all(iphone_xs_tweets_required_text$text, '"', ' ')

no_of_tweets <- nrow(iphone_xs_tweets_required_text)

query <- paste("INSERT INTO twitter VALUES (", '"', iphone_xs_tweets_required_text[1,], '")', collapse = '')
query

## [1] "INSERT INTO twitter VALUES ( \" @iammrdom @ChucklefishLTD Same, on iPhone XS Max \")"

#dbGetQuery(mydb, query)


## Running a loop to load the whole data frame (All tweets) into MySQL table
for (i in 1:no_of_tweets) {
  query <- paste("INSERT INTO twitter VALUES (", '"', iphone_xs_tweets_required_text[i,], '")', collapse = '')
  dbGetQuery(mydb, query)
}

Fetch from MySQL

get.MySQL.table.query <- paste("select * from twitter;")
iphone_xs_tweets_from_MySQL_df <- dbGetQuery(mydb, get.MySQL.table.query)

Text Mining and Word Cloud

### Loading the tweets text into a corpus
myCorpus <- Corpus(VectorSource(iphone_xs_tweets_from_MySQL_df$tweet_text))
inspect(myCorpus[1:20])

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 20
## 
##  [1]  @iammrdom @ChucklefishLTD Same, on iPhone XS Max                                                                                                                                                                                                                                                            
##  [2]  @Cult0fPing Iâ\200\231m having the same issue here. Thereâ\200\231s $250 left on my iPhone 7 contract. Iâ\200\231m getting the XS for Christmas and have to trade mine in to be able to afford it. Not sure of the exact value theyâ\200\231re giving me but itâ\200\231s around the same, and Iâ\200\231m still having to pay $500 for the XS. 
##  [3]  I liked a @YouTube video https://t.co/JmbjLyAUV7 What's On My iPhone XS Max?!                                                                                                                                                                                                                               
##  [4]  I liked a @YouTube video https://t.co/XBCrvfM579 WHAT'S ON MY IPHONE XS MAX + GIVEAWAY!!! | DIMMA UMEH                                                                                                                                                                                                      
##  [5]  @__maggz_ @trevkudzi S/O to PC for the iphone Xs Max 256gb                                                                                                                                                                                                                                                  
##  [6]  @AuntyAdah iPhone XS Max                                                                                                                                                                                                                                                                                    
##  [7]  Iâ\200\231m convinced niggas donâ\200\231t really like females no more, they just want us to get 2 iPhone Xs in our name ð¤·ð\217¾â\200\215â\231\200ï¸\217                                                                                                                                                                              
##  [8]  @NotYetUhuru_ Canâ\200\231t wait to tweet via my iPhone xs max bought cash ð¤£ð¤£ð¤£ð¤£ donâ\200\231t even know when that will be                                                                                                                                                                                   
##  [9]  Iphone Xs Max ð\230\231 https://t.co/7ynFPLBT12                                                                                                                                                                                                                                                                  
## [10]  Guess Iâ\200\231ll be getting the iPhone XS Max                                                                                                                                                                                                                                                                   
## [11]  I got the iphone xr but I actually hate it now idk if I should get the iphone xs or just get the 8 :(                                                                                                                                                                                                       
## [12]  Stuck between getting a iPhone XS Max or iPhone XR . Decisions decisions decisions ð\230©ð\230©ð\230©                                                                                                                                                                                                             
## [13]  Anyone here have the iPhone XS Max? I want to get it but is it too big??                                                                                                                                                                                                                                    
## [14]  My bf just got the iPhone XS a week ago &amp; dropped it now it has a small crack smh ð¤¦ð\217¾â\200\215â\231\200ï¸\217                                                                                                                                                                                                     
## [15]  Must be that iPhone XS Max https://t.co/iW3ZqVvFqz                                                                                                                                                                                                                                                          
## [16]  The network is bad, I donâ\200\231t know if giving out iPhone Xs Max will make it better. https://t.co/yaWypC19mV                                                                                                                                                                                                 
## [17]  iPhone Xs MAX                                                                                                                                                                                                                                                                                               
## [18]  @BlaxkHefner iPhone XS Max â\235¤ï¸\217                                                                                                                                                                                                                                                                           
## [19]  Bring iPhone X â\200¢ XS â\200¢ XS Max &amp; AirPods ... for Christmas Money ð¯\n \nâ\230ï¸\217 0265668842 / 0202503435                                                                                                                                                                                              
## [20]  @O2 eh listen up! Iâ\200\231m outraged I had to travel 5 hours to and from work without any data, you best hook me up with a cheap iPhone XS contract, I ainâ\200\231t after compensation kmt

### Converting to lower characters
myCorpus <- tm_map(myCorpus, content_transformer(tolower))

## Warning in tm_map.SimpleCorpus(myCorpus, content_transformer(tolower)):
## transformation drops documents

## Remove http
removeURL <- function(x) gsub("http[^[:space:]]*", "", x)
myCorpus <- tm_map(myCorpus, content_transformer(removeURL))

## Warning in tm_map.SimpleCorpus(myCorpus, content_transformer(removeURL)):
## transformation drops documents

## Remove any junk characters
removeNumPunct <- function(x) gsub("[^[:alpha:][:space:]]*", "", x)
myCorpus <- tm_map(myCorpus, content_transformer(removeNumPunct))

## Warning in tm_map.SimpleCorpus(myCorpus,
## content_transformer(removeNumPunct)): transformation drops documents

## Remove stopwords
myStopwords <- c(setdiff(stopwords('english'), c("r", "big")),
                 "use", "see", "used", "via", "amp")
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)

## Warning in tm_map.SimpleCorpus(myCorpus, removeWords, myStopwords):
## transformation drops documents

## Strip whitespace
myCorpus <- tm_map(myCorpus, stripWhitespace)

## Warning in tm_map.SimpleCorpus(myCorpus, stripWhitespace): transformation
## drops documents

## Stemming
myCorpus <- tm_map(myCorpus, stemDocument) # stem words

## Warning in tm_map.SimpleCorpus(myCorpus, stemDocument): transformation
## drops documents

inspect(myCorpus[1:20])

## <<SimpleCorpus>>
## Metadata:  corpus specific: 1, document level (indexed): 0
## Content:  documents: 20
## 
##  [1] iammrdom chucklefishltd iphon xs max                                                                                                        
##  [2] cultfp iâm issu thereâ left iphon contract iâm get xs christma trade mine abl afford sure exact valu theyâr give itâ around iâm still pay xs
##  [3] like youtub video what iphon xs max                                                                                                         
##  [4] like youtub video what iphon xs max giveaway dimma umeh                                                                                     
##  [5] maggz trevkudzi pc iphon xs max gb                                                                                                          
##  [6] auntyadah iphon xs max                                                                                                                      
##  [7] iâm convinc nigga donât realli like femal just want us get iphon xs name ðÿðÿââï                                                            
##  [8] notyetuhuru canât wait tweet iphon xs max bought cash ðÿðÿðÿðÿ donât even know will                                                         
##  [9] iphon xs max ðÿ                                                                                                                             
## [10] guess iâll get iphon xs max                                                                                                                 
## [11] got iphon xr actual hate now idk get iphon xs just get                                                                                      
## [12] stuck get iphon xs max iphon xr decis decis decis ðÿðÿðÿ                                                                                    
## [13] anyon iphon xs max want get big                                                                                                             
## [14] bf just got iphon xs week ago drop now small crack smh ðÿðÿââï                                                                              
## [15] must iphon xs max                                                                                                                           
## [16] network bad donât know give iphon xs max will make better                                                                                   
## [17] iphon xs max                                                                                                                                
## [18] blaxkhefn iphon xs max âï                                                                                                                   
## [19] bring iphon x â xs â xs max airpod christma money ðÿ âï                                                                                     
## [20] o eh listen iâm outrag travel hour work without data best hook cheap iphon xs contract ainât compens kmt

## Creating Term Document Matrix
tdm <- TermDocumentMatrix(myCorpus)

tdm <- as.matrix(tdm)
tdm[1:10, 1:20]

##                 Docs
## Terms            1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
##   chucklefishltd 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   iammrdom       1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   iphon          1 1 1 1 1 1 1 1 1  1  2  2  1  1  1  1  1  1  1  1
##   max            1 0 1 1 1 1 0 1 1  1  0  1  1  0  1  1  1  1  1  0
##   abl            0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   afford         0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   around         0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   christma       0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  1  0
##   contract       0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1
##   cultfp         0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0

## As we see here, iphon is the word in every document, which is justified, as we searched for iphone xs. Hence we will remove this term - iphon, as it does not solve any purpose here, as it will be present in all the tweets fetched.
myCorpus <- tm_map(myCorpus, removeWords, c("iphon", "xs"))

## Warning in tm_map.SimpleCorpus(myCorpus, removeWords, c("iphon", "xs")):
## transformation drops documents

## Building the tdm again after removing iphon word:
tdm <- TermDocumentMatrix(myCorpus)

tdm <- as.matrix(tdm)
tdm[1:10, 1:20]

##                 Docs
## Terms            1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
##   chucklefishltd 1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   iammrdom       1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   max            1 0 1 1 1 1 0 1 1  1  0  1  1  0  1  1  1  1  1  0
##   abl            0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   afford         0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   around         0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   christma       0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  1  0
##   contract       0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1
##   cultfp         0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
##   exact          0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0

## Building the word frequency
word.freq <- rowSums(tdm)
word.freq <- sort(word.freq, decreasing = TRUE)
set.seed(222)

## Top 20 used words
word.freq[1:20]

##          max         appl          new     giveaway         like 
##         6588         2374         1408         1362         1242 
##         case        phone          get       youtub          win 
##         1193         1144         1132         1081         1002 
##        video        enter          now        check         just 
##          996          937          910          857          820 
##       camera     casetifi gravyardgirl          got          can 
##          729          716          715          598          565

## Building the word cloud
twitter.words <- names(word.freq)

wordcloud(words = twitter.words, 
          freq = word.freq,
          max.words = 150,
          random.order = F,
          min.freq = 5,
          colors = brewer.pal(8, 'Dark2'),
          scale = c(5,0.3),
          rot.per = 0.3)

Now, from the word cloud, we see that there are many words like below, which do not carry much meaning in the context of this analysis and hence can be removed. Some of these words ame from the ad tweets, some are common words like phone, youtub, etc. We will remove such words now, so that we can have a better and a more meaningful word cloud.

myCorpus <- tm_map(myCorpus, removeWords, c("max", "appl", "giveaway", "youtub", "win", "phone", "gravyardgirl", "gravyardgirliphonegiveaway"))

## Warning in tm_map.SimpleCorpus(myCorpus, removeWords, c("max", "appl",
## "giveaway", : transformation drops documents

## Building a fresh tdm now as we just removed some words.
tdm <- TermDocumentMatrix(myCorpus)

tdm <- as.matrix(tdm)

## Building the updated word frequency
word.freq <- rowSums(tdm)
word.freq <- sort(word.freq, decreasing = TRUE)
set.seed(222)

## Top 20 used words now
word.freq[1:20]

##      new     like     case      get    video    enter      now    check 
##     1408     1242     1193     1132      996      937      910      857 
##     just   camera casetifi      got      can      one      buy     want 
##      820      729      716      598      565      541      530      522 
##     plus    pixel    price     esim 
##      497      456      438      421

## Building the updated word cloud
twitter.words <- names(word.freq)

wordcloud(words = twitter.words, 
          freq = word.freq,
          max.words = 150,
          random.order = F,
          min.freq = 5,
          colors = brewer.pal(8, 'Dark2'),
          scale = c(5,0.3),
          rot.per = 0.3)

From the updated word cloud above, we get some good insights as below:

People are talking positive words like - “like”, “get”, “want”, “buy”, “great”, hence meaning a positive sentiment overall
Words like “galaxy”, “samsung”, “android” also appear here, indicating that the users are comparing iphone xs as compared to android phones

Sentiment Analysis

sentiment_scores <- get_nrc_sentiment(iphone_xs_tweets_from_MySQL_df$tweet_text)
head(sentiment_scores)

##   anger anticipation disgust fear joy sadness surprise trust negative
## 1     0            0       0    0   0       0        0     0        0
## 2     0            1       0    1   1       0        0     2        1
## 3     0            0       0    0   0       0        0     0        0
## 4     0            0       0    0   0       0        0     0        0
## 5     0            0       0    0   0       0        0     0        0
## 6     0            0       0    0   0       0        0     0        0
##   positive
## 1        0
## 2        3
## 3        0
## 4        0
## 5        0
## 6        0

barplot(colSums(sentiment_scores),
        las = 2,
        col = rainbow(10),
        ylab = "Count",
        main = "Sentiment scores for iphone xs Tweets")

Data 607 Final Project - Part-2 : Tweets Cleaning and Analysis

Deepak Mongia and Charls Joseph

December 8, 2018

Tweets Clean-up and analysis

Pre-requisites:

Read CSV (Tweets data from staging database)

Counts per user

Attempt to remove ads

Storing data in Relational database

Fetch from MySQL

Text Mining and Word Cloud

Sentiment Analysis

Conclusions:

1. The overall response from the people on the iphone XS is positive.

2. Some very positive words appear on the top of the list of mostly used words in iphone xs tweets, as seen in the word cloud.

3. As there are so many users / people interested in the brand, there have been many advertizements going around to woo customers by so many third party online companies.

5. Even if there is not a single conclusion from this analysis we found some good information about the overall sentiments around iPhone XS which can be seen in the plots and points shown in this document.

Challenges Faced:

Data 607 Final Project - Part-2 : Tweets Cleaning and Analysis

Deepak Mongia and Charls Joseph

December 8, 2018

Tweets Clean-up and analysis

Pre-requisites:

Read CSV (Tweets data from staging database)

Counts per user

Attempt to remove ads

Storing data in Relational database

Fetch from MySQL

Text Mining and Word Cloud

Sentiment Analysis

Conclusions:

1. The overall response from the people on the iphone XS is positive.

2. Some very positive words appear on the top of the list of mostly used words in iphone xs tweets, as seen in the word cloud.

3. As there are so many users / people interested in the brand, there have been many advertizements going around to woo customers by so many third party online companies.

4. With such hype around a product by a company like Apple, many people voice their views and opinions online. And Twitter being one of the most used social networking portal, it gave a good deal of tweets to analyze the customer sentiment.

5. Even if there is not a single conclusion from this analysis we found some good information about the overall sentiments around iPhone XS which can be seen in the plots and points shown in this document.

Challenges Faced: