Twitter clustering

Data

Data for this experiment comprises of 10K Russian language tweets on racism and LGBT.

Posts per user

There is no indication of automated trolling in both topics in the number of posts from the same users account.

par(mfrow=c(1,2))
hist(table(docvars(corp_race, 'Twitter.Author.ID')), main = 'Racism')
hist(table(docvars(corp_lgbt, 'Twitter.Author.ID')), main = 'LGBT')

par(mfrow=c(1,1))

Hierarchical clustering

I combined posts by the same user to create pseudo documents, removed Twitter tags (# and @) and URL, and calculated pair-wise document similarity for hierarchical clustering.

Racism

plot(tree_race, labels = FALSE)

LGBT

plot(tree_lgbt, labels = FALSE)