Selecting an event vol2
Just a quick dump of previous visualisations with small visual improvements (added all months).
As can bee seen in Graph 5, the highest peaks in the Hate subset is as follows: First week of July, first week of May, two consecutive weeks in Late July-Early August, and lastly, one week late August.
Now, I’m more inclined to choose the two consecutive weeks in Late July-Early August for further analysis. Hate volume is relatively high (see graph 5) but the total volume of tweets are relatively low when compared to other peaks in the hate subset. (see Graph 6)
If I am to talk in numbers:
nrow(slur.edited [slur.edited$tweet.time.posix >= "2016-07-24" & slur.edited$tweet.time.posix <= "2016-08-08", ])
## [1] 976
nrow(totaldata [totaldata$tweet.time.posix >= "2016-07-24" & totaldata$tweet.time.posix <= "2016-08-08", ])
## [1] 103838
Total volume of tweets in the specified 2 weeks are 103838. 976 of them matched with the Antisemitic hate pattern. Be mindful this is 2 weeks summed together so if we are to do just one week it would be 50K of tweet volume each week and around 480 Hate tweets each week.
Running 103K tweets from classifier should take around 6 hours.
nrow(slur.edited [slur.edited$tweet.time.posix >= "2016-04-25" & slur.edited$tweet.time.posix <= "2016-05-01", ])
## [1] 487
nrow(totaldata [totaldata$tweet.time.posix >= "2016-04-25" & totaldata$tweet.time.posix <= "2016-05-01", ])
## [1] 232073
Total volume of tweets in the specified the week ending 1st of May is 232073. 487 of them matched with the Antisemitic hate pattern.
Running 232K tweets from classifier should take around 15 hours.
nrow(slur.edited [slur.edited$tweet.time.posix >= "2016-06-27" & slur.edited$tweet.time.posix <= "2016-07-03", ])
## [1] 582
nrow(totaldata [totaldata$tweet.time.posix >= "2016-06-27" & totaldata$tweet.time.posix <= "2016-07-03", ])
## [1] 98443
This is the highest peak in the Hate subset. Total volume of tweets in the specified the week ending 3rd of July is 98443. 582 of them matched with the Antisemitic hate pattern. Running 98K tweets from classifier should take around 5.5 hours.
I guess we shold also consider the computing time to derive retweet metrics but I’m not sure how long it takes.
What do you think?