This note is for the UTSA students who are taking my Data Analytics Applications (DA 6813) course. For this analysis you must have your Twitter credentials. If you don’t have them refer to the content folder on Blackboard for the instructions to get them.

For this note you will need twitteR and sentimentr packages.

library(twitteR)
library(sentimentr)
library(plyr)   # To get a frequency table
## 
## Attaching package: 'plyr'
## The following object is masked from 'package:twitteR':
## 
##     id

Once twitteR loads up, use your consumer key and consumer secret to set up Twitter oauth. I am not going to display the actual code here because it contains my keys but it will be of this nature:

# setup_twitter_oauth(consumer_key = "xxx", consumer_secret = "yyyy")
# Where "xxx" and "yyyy" are your credentials.

Getting the tweets

Once your credentials are accepted by Twitter, you can access its API. In this note I am going to get 1000 tweets containing a trending topic in San Antonio, TX. For this we need to get a woeid of the location. woeid stands for “where on earth ID”. So let’s first get that.

avloc <- availableTrendLocations()
head(avloc)
##        name country woeid
## 1 Worldwide             1
## 2  Winnipeg  Canada  2972
## 3    Ottawa  Canada  3369
## 4    Quebec  Canada  3444
## 5  Montreal  Canada  3534
## 6   Toronto  Canada  4118

In the above code I created an object avloc which contains information on the name of the location, the country, and its respective woeid. For example, Toronto’s woeid is 4118. Let’s see whether San Antonio appears on this list.

avloc[avloc$name == "San Antonio",]
##            name       country   woeid
## 389 San Antonio United States 2487796

San Antonio’s woeid is 2487796. We will need this to get the trending topics in San Antonio at a given hour. I am going to use getTrends function from twitteR to obtain these trends. Rather than copying and pasting the woeid I will simply reference it from the avloc. In the following code, R will automatically retrieve the value stored in the cell where the value of name variable in avloc is “San Antonio” and which belongs to 3rd column, which as know has woeid. This way I reduce the chance of making a mistake in copying and pasting the woeid.

Getting geocode using Google Maps

Where will you get the latitude and longitude? A crude but simple way is to use Google Maps. Search for your location and Google Maps will take you there. The URL in your browser will contain latitude and longitude. Here is a screenshot of the UTSA search in Google Maps.

UTSA on Google Map. Note the URL

UTSA on Google Map. Note the URL

Writing a search query

With this much information we can now start our search. I am going to ask for 1000 tweets all in English language. Take note of the geocode parameter. I literally copied the first two numbers from Google Maps URL!

tweet <- searchTwitter(trend[1,1], n= 1000, lang = 'en', geocode = '29.5845579,-98.6187748,20mi')
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 1000 tweets were requested but the
## API can only return 276
class (tweet) # Check the class of 'tweet' object
## [1] "list"

Get tweets in a data frame

Once the search is complete, twitteR returns a list, which can be converted into a data frame for ease of analysis. I will use twListToDF function from twitteR package.

tweet <- twListToDF(tweet)
class(tweet) # Check class of 'tweet' object and verify that it's data frame
## [1] "data.frame"

Let’s print first 10 tweets in our data frame.

head(tweet,10)
##                                                                                                                                                                text
## 1                         @violadavis exquisitely too juicy \xed\xa0\xbd\xed\xb8\xb1\xed\xa0\xbd\xed\xb8\xb1\xed\xa0\xbd\xed\xb2\x80#HTGAWM https://t.co/ntK5KEuw2h
## 2           @violadavis OH MY DAMN \xed\xa0\xbd\xed\xb8\xb1\xed\xa0\xbd\xed\xb1\x8d\xed\xa0\xbc\xed\xbf\xbc\xed\xa0\xbd\xed\xb1\x80 #HTGAWM https://t.co/5m3s0uDNJj
## 3                                         #HTGAWM  Do just want to go through a faze of being a badboy then go back to Connor or is it your HIV pushing Connor away
## 4                         #HTGAWM But I do feel horrible for Connor the boy loves Olli and Olli was giving signals to later push him away like dude what's the deal
## 5                       #HTGAWM Annalise was like Oliver useless we will keep him in back but now she's like Wipe this device clean all of… https://t.co/oSGPolt8Ah
## 6                                                                                                  #HTGAWM At least this dead body not Oliver but what about Connor
## 7                      #HTGAWM Omg what Oliver part of this murder helping Annalise she gave the phone \xed\xa0\xbd\xed\xb3\xb1 he definitely part of the squad now
## 8  What did Annalise do now!? Ay and who is #UnderTheSheet?? And really Oliver cooking dinner but you want out? \xed\xa0\xbd\xed\xb8\xad this is too much.  #HTGAWM
## 9                                                                                               RT @MyNameisAmreena: Okay now I really don't know who it is #HTGAWM
## 10                                                          #HTGAWM is driving me insane!! \xed\xa0\xbd\xed\xb8\xa9\xed\xa0\xbd\xed\xb8\xa9\xed\xa0\xbd\xed\xb8\xa9
##    favorited favoriteCount  replyToSN             created truncated
## 1      FALSE             0 violadavis 2016-09-30 03:18:00     FALSE
## 2      FALSE             0 violadavis 2016-09-30 03:17:47     FALSE
## 3      FALSE             0       <NA> 2016-09-30 03:15:46     FALSE
## 4      FALSE             0       <NA> 2016-09-30 03:14:27     FALSE
## 5      FALSE             1       <NA> 2016-09-30 03:10:33      TRUE
## 6      FALSE             1       <NA> 2016-09-30 03:06:33     FALSE
## 7      FALSE             0       <NA> 2016-09-30 03:04:30     FALSE
## 8      FALSE             1       <NA> 2016-09-30 03:03:42     FALSE
## 9      FALSE             0       <NA> 2016-09-30 03:02:36     FALSE
## 10     FALSE             0       <NA> 2016-09-30 03:02:02     FALSE
##            replyToSID                 id replyToUID
## 1                <NA> 781694560145580032 2717254872
## 2  781691193176436736 781694503597977600 2717254872
## 3                <NA> 781693997391552514       <NA>
## 4                <NA> 781693664678379520       <NA>
## 5                <NA> 781692685279121409       <NA>
## 6                <NA> 781691675731120128       <NA>
## 7                <NA> 781691161131954177       <NA>
## 8                <NA> 781690958266011649       <NA>
## 9                <NA> 781690682763251712       <NA>
## 10               <NA> 781690538156183553       <NA>
##                                                                            statusSource
## 1  <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
## 2     <a href="http://twitter.com/#!/download/ipad" rel="nofollow">Twitter for iPad</a>
## 3    <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 4    <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 5    <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 6    <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 7    <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 8    <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 9    <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 10   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
##         screenName retweetCount isRetweet retweeted longitude latitude
## 1  ShannaSince1987            0     FALSE     FALSE        NA       NA
## 2       summer0001            0     FALSE     FALSE        NA       NA
## 3    LunaticNation            0     FALSE     FALSE        NA       NA
## 4    LunaticNation            0     FALSE     FALSE        NA       NA
## 5    LunaticNation            0     FALSE     FALSE        NA       NA
## 6    LunaticNation            0     FALSE     FALSE        NA       NA
## 7    LunaticNation            0     FALSE     FALSE        NA       NA
## 8        sugamandy            0     FALSE     FALSE        NA       NA
## 9    IamNot_aWhore            1      TRUE     FALSE        NA       NA
## 10       merndadlg            0     FALSE     FALSE        NA       NA

Cleaning up the text

At this point I am going to do some basic level of cleaning. In tweet column statusSource contains the information about the source of the tweet—whether it was sent from an iPhone, Android phone, Twitter web, etc. But the variable values are quite messy and it’s not possible to make a nice frequency table with them. So let’s clean up that variable.

In all the values I printed above for this variable, you will see </a> appearing at the end of each value. We can easily replace this string using the powerful gsub function in base R. We will replace it by literally nothing. In order to avoid overwriting the variable statusSource, I will create another variable statusSource1

tweet$statusSource1 <- gsub('</a>',"",tweet$statusSource)
head(tweet$statusSource1)
## [1] "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android"
## [2] "<a href=\"http://twitter.com/#!/download/ipad\" rel=\"nofollow\">Twitter for iPad"   
## [3] "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone"  
## [4] "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone"  
## [5] "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone"  
## [6] "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone"

As we see, gsub nicely replaced all </a> with nothing!

Next we need to replace a long text string which is enclosed in <>. We can use gsub with regular expressions (regex) in order to replace this entire string. For more patterns in regex, check out this link: [http://www.endmemo.com/program/R/gsub.php]. Another helpful website is [https://www.memberpress.com/how-to-become-a-regular-expression-power-user/]

In the following code I am overwriting statusSource1.

tweet$statusSource1 <- gsub('.*>',"",tweet$statusSource1)
head(tweet$statusSource1)
## [1] "Twitter for Android" "Twitter for iPad"    "Twitter for iPhone" 
## [4] "Twitter for iPhone"  "Twitter for iPhone"  "Twitter for iPhone"

Now we have a clean variable! Let’s get a frequency table using count() function in the package plyr

plyr::count(tweet$statusSource1)
##                     x freq
## 1            Facebook    2
## 2                Path    1
## 3           RoundTeam    1
## 4         TVShow Time    1
## 5 Twitter for Android   94
## 6    Twitter for iPad    2
## 7  Twitter for iPhone  166
## 8 Twitter for Windows    1
## 9  Twitter Web Client    8

This tutorial is being updated so I will add more stuff here soon.