Counting Twitter mentions

Synopsis

I want to count number of occurrences of an word in time period in all tweets. I will gather data from live API.

Data Processing

Tweets are dynamically gathered from date 8 days ago until today date. Then received tweets are converted to data frame. I'm estimating no more than 10000 tweets for 8 days period and search term. Later I'm checking if I received more or less than this hard-coded number of tweets. I picked 8 days because I'm interested in exact 7x24h periods, and twitteR package has resolution up to one day.

library(twitteR)
library(lubridate)
estimated.tweets <- 10000
current.date <- now()
past.date <- current.date - days(8)
exact.past.date <- current.date - days(7)
keys.from.twitter <- read.csv("twitter.csv") # personal oauth data from file
setup_twitter_oauth(keys.from.twitter$api.key, keys.from.twitter$api.secret, keys.from.twitter$access.token, keys.from.twitter$access.secret)
## [1] "Using direct authentication"
interesting.string <- "iBeacon"
twitter.data <- searchTwitter(searchString = interesting.string, since = as.character(as.Date(past.date)), n = estimated.tweets)
## Warning: 10000 tweets were requested but the API can only return 5397
tweets.data <- twListToDF(twitter.data)

Results

Checking if we received all tweets with keyword from defined period:

if (nrow(tweets.data) < estimated.tweets) {
    print("All tweets received")
  } else {
    print("There could be more Tweets")
    }
## [1] "All tweets received"

Filtering tweets for exact period

nrow(tweets.data)
## [1] 5397
tweets.data <- tweets.data[tweets.data$created >= exact.past.date, ]

Sample texts from tweets:

head(tweets.data$text)
## [1] "Will businesses bite into Apple's new iBeacon? @WSJ investigates: http://t.co/Byjxned1Qr http://t.co/oOES3c8nc3"              
## [2] "@tperfitt find power and plug in iBeacon!"                                                                                    
## [3] "THE BEACONS FAQ: It's Time To Set The Story Straight About Beacons And Apple's iBeacon System http://t.co/FNugBZWLof via @sai"
## [4] "@tugrult +++++Logo kampanya modülü ,İbeacon ve akıllı telefonlarla yeteneği cok daha fazla güçlenecektir."                    
## [5] "Beacons explained: http://t.co/DftOmvXUPl via @sai"                                                                           
## [6] "Will Apple’s iBeacon Whet German Diners’ App-etite? - Digits - WSJ http://t.co/3H5KSOCWLz http://t.co/jfF7C6t6FY"

Number of tweets

number.of.tweets <- nrow(tweets.data)

And to be sure, lets check date range

summary(tweets.data$created)
##                  Min.               1st Qu.                Median 
## "2014-07-28 15:35:10" "2014-07-29 08:15:09" "2014-07-30 07:50:08" 
##                  Mean               3rd Qu.                  Max. 
## "2014-07-30 15:37:54" "2014-07-31 18:15:34" "2014-08-03 15:15:02"

There are 5397 tweets mentioning iBeacon from 2014-07-27 17:15:56 to 2014-08-03 17:15:56.

R and packages information

Following versions of R and packages were used.

sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8    
##  [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8   
##  [7] LC_PAPER=pl_PL.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] lubridate_1.3.3 twitteR_1.1.8   knitr_1.6      
## 
## loaded via a namespace (and not attached):
##  [1] bit_1.1-12     bit64_0.9-4    digest_0.6.4   evaluate_0.5.5
##  [5] formatR_0.10   httr_0.4       memoise_0.2.1  plyr_1.8.1    
##  [9] Rcpp_0.11.2    RCurl_1.95-4.1 rjson_0.2.14   stringr_0.6.2 
## [13] tools_3.1.1