I want to count number of occurrences of an word in time period in all tweets. I will gather data from live API.
Tweets are dynamically gathered from date 8 days ago until today date. Then received tweets are converted to data frame. I'm estimating no more than 10000 tweets for 8 days period and search term. Later I'm checking if I received more or less than this hard-coded number of tweets. I picked 8 days because I'm interested in exact 7x24h periods, and twitteR package has resolution up to one day.
library(twitteR)
library(lubridate)
estimated.tweets <- 10000
current.date <- now()
past.date <- current.date - days(8)
exact.past.date <- current.date - days(7)
keys.from.twitter <- read.csv("twitter.csv") # personal oauth data from file
setup_twitter_oauth(keys.from.twitter$api.key, keys.from.twitter$api.secret, keys.from.twitter$access.token, keys.from.twitter$access.secret)
## [1] "Using direct authentication"
interesting.string <- "iBeacon"
twitter.data <- searchTwitter(searchString = interesting.string, since = as.character(as.Date(past.date)), n = estimated.tweets)
## Warning: 10000 tweets were requested but the API can only return 5397
tweets.data <- twListToDF(twitter.data)
Checking if we received all tweets with keyword from defined period:
if (nrow(tweets.data) < estimated.tweets) {
print("All tweets received")
} else {
print("There could be more Tweets")
}
## [1] "All tweets received"
Filtering tweets for exact period
nrow(tweets.data)
## [1] 5397
tweets.data <- tweets.data[tweets.data$created >= exact.past.date, ]
Sample texts from tweets:
head(tweets.data$text)
## [1] "Will businesses bite into Apple's new iBeacon? @WSJ investigates: http://t.co/Byjxned1Qr http://t.co/oOES3c8nc3"
## [2] "@tperfitt find power and plug in iBeacon!"
## [3] "THE BEACONS FAQ: It's Time To Set The Story Straight About Beacons And Apple's iBeacon System http://t.co/FNugBZWLof via @sai"
## [4] "@tugrult +++++Logo kampanya modülü ,İbeacon ve akıllı telefonlarla yeteneği cok daha fazla güçlenecektir."
## [5] "Beacons explained: http://t.co/DftOmvXUPl via @sai"
## [6] "Will Apple’s iBeacon Whet German Diners’ App-etite? - Digits - WSJ http://t.co/3H5KSOCWLz http://t.co/jfF7C6t6FY"
Number of tweets
number.of.tweets <- nrow(tweets.data)
And to be sure, lets check date range
summary(tweets.data$created)
## Min. 1st Qu. Median
## "2014-07-28 15:35:10" "2014-07-29 08:15:09" "2014-07-30 07:50:08"
## Mean 3rd Qu. Max.
## "2014-07-30 15:37:54" "2014-07-31 18:15:34" "2014-08-03 15:15:02"
There are 5397 tweets mentioning iBeacon from 2014-07-27 17:15:56 to 2014-08-03 17:15:56.
Following versions of R and packages were used.
sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=pl_PL.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8
## [5] LC_MONETARY=pl_PL.UTF-8 LC_MESSAGES=pl_PL.UTF-8
## [7] LC_PAPER=pl_PL.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.3.3 twitteR_1.1.8 knitr_1.6
##
## loaded via a namespace (and not attached):
## [1] bit_1.1-12 bit64_0.9-4 digest_0.6.4 evaluate_0.5.5
## [5] formatR_0.10 httr_0.4 memoise_0.2.1 plyr_1.8.1
## [9] Rcpp_0.11.2 RCurl_1.95-4.1 rjson_0.2.14 stringr_0.6.2
## [13] tools_3.1.1