Selecting an event vol2

Recap with Visualisations

Just a quick dump of previous visualisations with small visual improvements (added all months).

Weekly Hate and Complete Dataset

As can bee seen in Graph 5, the highest peaks in the Hate subset is as follows: First week of July, first week of May, two consecutive weeks in Late July-Early August, and lastly, one week late August.

Now, I’m more inclined to choose the two consecutive weeks in Late July-Early August for further analysis. Hate volume is relatively high (see graph 5) but the total volume of tweets are relatively low when compared to other peaks in the hate subset. (see Graph 6)

If I am to talk in numbers:

The Consecutive 2 weeks in August

nrow(slur.edited [slur.edited$tweet.time.posix >= "2016-07-24" & slur.edited$tweet.time.posix <= "2016-08-08", ])
## [1] 976
nrow(totaldata [totaldata$tweet.time.posix >= "2016-07-24" & totaldata$tweet.time.posix <= "2016-08-08", ])
## [1] 103838

Total volume of tweets in the specified 2 weeks are 103838. 976 of them matched with the Antisemitic hate pattern. Be mindful this is 2 weeks summed together so if we are to do just one week it would be 50K of tweet volume each week and around 480 Hate tweets each week.

Running 103K tweets from classifier should take around 6 hours.

The Peak in May

nrow(slur.edited [slur.edited$tweet.time.posix >= "2016-04-25" & slur.edited$tweet.time.posix <= "2016-05-01", ])
## [1] 487
nrow(totaldata [totaldata$tweet.time.posix >= "2016-04-25" & totaldata$tweet.time.posix <= "2016-05-01", ])
## [1] 232073

Total volume of tweets in the specified the week ending 1st of May is 232073. 487 of them matched with the Antisemitic hate pattern.

Running 232K tweets from classifier should take around 15 hours.

The Peak in July

nrow(slur.edited [slur.edited$tweet.time.posix >= "2016-06-27" & slur.edited$tweet.time.posix <= "2016-07-03", ])
## [1] 582
nrow(totaldata [totaldata$tweet.time.posix >= "2016-06-27" & totaldata$tweet.time.posix <= "2016-07-03", ])
## [1] 98443

This is the highest peak in the Hate subset. Total volume of tweets in the specified the week ending 3rd of July is 98443. 582 of them matched with the Antisemitic hate pattern. Running 98K tweets from classifier should take around 5.5 hours.

I guess we shold also consider the computing time to derive retweet metrics but I’m not sure how long it takes.

What do you think?

The End