Twitter Analysis in R

Getting Started

It is fairly easy to create Twitter analysis in R. You need a few packages to kick things off.

require(twitteR)
require(wordcloud)
require(igraph)

If you don't already have these then install them and their dependencies, before entering the above.

Get the Tweets

This is achieved using the twitteR package. This package offers a number of functions to access the twitter API. Here, the searchTwitter function is used. In this case, a search for 250 tweets containing the pattern '#occupy' is carried out. The value returned is a list of 250 tweet 'status' objects. These objects are the new(ish) R reference objects - similar to Python mutable objects.

tweets <- searchTwitter("#occupy", n = 250)

Extract Information For analysis

The next step is to carry out some kind of analysis on the tweets. As a first example, an analysis of the tweeters is presented. The tweet 'status' objects, as mentioned earlier, are R reference objects, and contain a number of items of information about a tweet. Perhaps most importantly, they contain the text of the tweet - but they also contain the 'screen name' of the tweeter. When viewing tweets on a browser, these are the names to which each tweet is attributed. Analysing these is also important - in the above example, an analysis of the authors of tweets relating to the Occupy movement should identify some key actors in this movement (or at least some that are active at the time of the tweet search).

The following code loops through the list of 'status' objects, extracts the 'screen names' of the tweeters and then - via the table function, a frequency table of the screen names is contructed. This is stored in the tweeter.freq array - the names of items in this array are the screen names, and the values are the frequency that each of these screen names appears in the tweets.

# Create an empty character vector of the same length as the number of
# tweets
l <- length(tweets)
tweeter <- vector(mode = "character", length = l)

# Extract the screen names from each tweet status
for (i in 1:l) tweeter[i] <- tweets[[i]]$getScreenName()


# Compile the frequencies of each screen name
tweeter.freq <- table(tweeter)

Create a Word Cloud - Who are the Tweeters?

The last section created a frequency table of the screen names of people sending tweets. In this section that information will be visualised. One way to do this is to generate a 'Word Cloud' such as those featured in Wordle - and the wordcloud package in R is a tool for this. Word clouds illustrate the words appearing in a list, and their relative frequency. The relative frequency is reflected in the font size of words appearing in the cloud. The following code produces a word cloud from the Twitter screen name data. Pretty much all of the work is done by the wordcloud function – the only other command sets a black background - this isn't necessary, but does create a fairly striking image. Note that you may get some warnings that some of the texts could not be fitted into the word cloud. These can generally by remedied by re-sizing the plot window.

# Black backgrounds are 'this year's colour' in terms of dataviz!
par(bg = "black")

# Draw the word cloud
wordcloud(names(tweeter.freq), tweeter.freq, colors = c("tomato", 
    "wheat", "lightblue"), scale = c(6, 0.5), random.color = TRUE, rot.per = 0.5, 
    min.freq = 1, font = 2, family = "serif")

plot of chunk draw_cloud

Using the help facility in R will explain all of the parameters to wordcloud but for now the parameters used in the example are set out below:

Parameters 1 and 2 : …are the words to appear in the wordcloud and the frequencies of the words respectively.
colors : These are the colours to draw the text.
scale : A pair of numbers, representing the smallest and largest text size in the word cloud. These aren't in points, but are realtive to 'standard' font size - so in this case sizes range from half the standard height to six times the standard height.
random.color : Set to TRUE if the colours specified above are assigned randomly to the text strings. The alternative (FALSE) is to assign according to text size.
rot.per : This is the proportion of words that are rotated by $ 90^\circ $ - here set to one half.
min.freq : The smallest frequency of a word that appears in the cloud. Here it is set to 1, so all words are included.
font : Here set to the R numeric value of 2 - giving a bold sans serif font - experiment has shown this to be effective.
family : Linked to the above – use a family of serif fonts.

Looking at the word cloud, the dominant tweeter (by quite a large maregin) is HQOccupy - the next is leofroth . To dig more deeply, the next set of code extracts the tweet texts (in the same way that the screen names were extracted above) - and then selectively prints tweets attributed to those two screen names.

# Create an empty vector to store the tweet texts
texts <- vector(mode = "character", length = l)

# Extract the tweet text from each tweet status
for (i in 1:l) texts[i] <- tweets[[i]]$getText()

# List all the tweets tweeted by @leofroth
texts[tweeter == "leofroth"]

##  [1] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/1xqiA3MO #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Election #Election2012 #Occupyeverywhere"
##  [2] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/vOjZpRtQ #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Election #Election2012 #Occupyeverywhere"
##  [3] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/uzRbbko8 #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Troops #Military #Occupyeverywhere"      
##  [4] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/QinK9Hxg #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Troops #Military #Occupyeverywhere"      
##  [5] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/qVE206rc #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Election2012 #Election #Troops #Military"
##  [6] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/1xqiA3MO #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Election #Election2012 #Occupyeverywhere"
##  [7] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/vOjZpRtQ #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Election #Election2012 #Occupyeverywhere"
##  [8] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/uzRbbko8 #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Troops #Military #Occupyeverywhere"      
##  [9] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/QinK9Hxg #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Troops #Military #Occupyeverywhere"      
## [10] "RT @OCCUPYPORTSTLUC: Leo Roth - http://t.co/qVE206rc #iTunes #Occupywallstreet #OccupyVeterans #Occupy #ows #Veterans #Election2012 #Election #Troops #Military"

# List all the tweets tweeted by @HQOccupy
texts[tweeter == "HQOccupy"]

##  [1] "Hanging dummies shock Las Vegas motorists - http://t.co/UPAwXC9D http://t.co/6KnPxi97 #OccupyHQ #Occupy #OWS #OpESR"                         
##  [2] "Suspect in Occupy Oakland shooting ordered to stand trial http://t.co/dIdybOcV #OccupyHQ #Occupy #OWS #OpESR"                                
##  [3] "#OccupyHQ #OpMedia  Let’s Get It On! http://t.co/OFd9DGxc #Headlines #Occupy #OWS #OpESR"                                                    
##  [4] "#OccupyHQ #OpMedia  Ryan Grim, Mitt Romney’s response to your investigation and to... http://t.co/lVu5wNpu #Headlines #Occupy #OWS #OpESR"   
##  [5] "#OccupyHQ #OpMedia  No Country For Poor Man: CFTC Calls Unreturned, Calls Unanswered http://t.co/mLCtA42h  #Occupy #OWS #OpESR"              
##  [6] "#OccupyHQ #OpMedia  Hand-to-Mouth: Rise in UK working poor as spending cuts bite http://t.co/Af6EReRV #Headlines #Occupy #OWS #OpESR"        
##  [7] "#OccupyHQ #OpMedia  Illuminating Donald Trump: Trump vs. the 99% http://t.co/oWGuNN0a  #Occupy #OWS #OpESR"                                  
##  [8] "#OccupyHQ #OpMedia  Call for Solidarity: Please Support Occupy Media and Direct Action! http://t.co/J76WvGEH  #Occupy #OWS #OpESR"           
##  [9] "Call for Solidarity: Please Support Occupy Media and Direct Action! http://t.co/DoOCyPRE #OccupyHQ #Occupy #OWS #OpESR"                      
## [10] "Jennifer Britt, Detroit Homeowner, Staves Off Eviction With Help of Occupy Detroit http://t.co/cidCdpFA #OccupyHQ #Occupy #OWS #OpESR"       
## [11] "Man charged in connection to killing at Occupy Oakland encampment must face … http://t.co/5yQVCiAU #OccupyHQ #Occupy #OWS #OpESR"            
## [12] "Ron Paul’s Legacy: A Complete Audit Of The Secretive Banking Cartel? http://t.co/ELNx4gme #OccupyHQ #Occupy #OWS #OpESR"                     
## [13] "Occupied: Movement moves in to Pueblo - Pueblo Chieftain http://t.co/y9q6MRoX #OccupyHQ #Occupy #OWS #OpESR"                                 
## [14] "Occupy LA Chalk Protest Coming Back To Downtown LA: One Arrest Already ... - Huffington Post http://t.co/ctKP9faP #OccupyHQ #Occupy #OWS ..."
## [15] "Man U IPO: More Max Bialystock than Manchester United http://t.co/IIvJPEfu #OccupyHQ #Occupy #OWS #OpESR"                                    
## [16] "Momentum Builds Against Fracking http://t.co/NDUSbo3o #OccupyHQ #Occupy #OWS #OpESR"                                                         
## [17] "Split Normality – Greece 2.0 http://t.co/XHeh9nG0 #OccupyHQ #Occupy #OWS #OpESR"                                                             
## [18] "#iam132 Manifesto http://t.co/gkzcZCbm #OccupyHQ #Occupy #OWS #OpESR"                                                                        
## [19] "The Extreme Center http://t.co/FBXdMQFY #OccupyHQ #Occupy #OWS #OpESR"                                                                       
## [20] "West Bank: 5 Broken Cameras http://t.co/HcZb2LDY #OccupyHQ #Occupy #OWS #OpESR"                                                              
## [21] "Quebec: Red Square Revolt http://t.co/m7EiR9J4 #OccupyHQ #Occupy #OWS #OpESR"                                                                
## [22] "Occupy Wall Street: ONE YEAR http://t.co/T63XKbXS #OccupyHQ #Occupy #OWS #OpESR"                                                             
## [23] "SCOTLAND! http://t.co/tM5diN7V #OccupyHQ #Occupy #OWS #OpESR"                                                                                
## [24] "Oh because CONSPIRING to MANIPULATE global interest rates WASN’T a crime? http://t.co/QASQvkFv #OccupyHQ #Occupy #OWS #OpESR"                
## [25] "When is a Victory a Victory? Houston janitors reach tentative agreement http://t.co/1p9ySkKG #OccupyHQ #Occupy #OWS #OpESR"                  
## [26] "Occupy Chicago resists NATO http://t.co/S8q35CJQ #OccupyHQ #Occupy #OWS #OpESR"                                                              
## [27] "Wisconsin First State General Assembly http://t.co/8BFoWfaM #OccupyHQ #Occupy #OWS #OpESR"                                                   
## [28] "GRAPPLES or a Real Voting System? http://t.co/378rp7pB #OccupyHQ #Occupy #OWS #OpESR"                                                        
## [29] "Occupy The Records http://t.co/1lkxEnnF #OccupyHQ #Occupy #OWS #OpESR"                                                                       
## [30] "Occupy Caravan http://t.co/MDot0nRg #OccupyHQ #Occupy #OWS #OpESR"                                                                           
## [31] "Norris Brothers Band http://t.co/LAILsF3A #OccupyHQ #Occupy #OWS #OpESR"                                                                     
## [32] "Solidarity with Oakland http://t.co/ABYAYdYS #OccupyHQ #Occupy #OWS #OpESR"                                                                  
## [33] "SOPA Black Out! http://t.co/xzCo8ds7 #OccupyHQ #Occupy #OWS #OpESR"                                                                          
## [34] "#OCCUPYMAINSTREET http://t.co/Rt8JZdrD #OccupyHQ #Occupy #OWS #OpESR"                                                                        
## [35] "No Country For Poor Man: CFTC Calls Unreturned, Calls Unanswered http://t.co/RnALqIzm #OccupyHQ #Occupy #OWS #OpESR"                         
## [36] "Let’s Get It On! http://t.co/oqyKv4Ir #OccupyHQ #Occupy #OWS #OpESR"                                                                         
## [37] "Hand-to-Mouth: Rise in UK working poor as spending cuts bite http://t.co/949V3drp #OccupyHQ #Occupy #OWS #OpESR"                             
## [38] "Ryan Grim, Mitt Romney’s response to your investigation and to these allegations? http://t.co/poyH2kpY #OccupyHQ #Occupy #OWS #OpESR"        
## [39] "Occupy Wall Street: ONE YEAR http://t.co/T63XKbXS #OccupyHQ #Occupy #OWS #OpESR"                                                             
## [40] "SCOTLAND! http://t.co/tM5diN7V #OccupyHQ #Occupy #OWS #OpESR"                                                                                
## [41] "Oh because CONSPIRING to MANIPULATE global interest rates WASN’T a crime? http://t.co/QASQvkFv #OccupyHQ #Occupy #OWS #OpESR"

From this it may be seen that the leofroth tweets appear to be automated, and each one consists of a number of hashtags and a URL - without any other textual content. Closer inspection shows that these URLs are actually links to Itunes store pages, and that Leo Roth is styled as a 'Christian rock' singer. Presumably these tweets are a marketing exercise - and the choice of hashtags give some indication of the target audience.

The largest contributor is HQOccupy - although this account contributes the alrgest number of tweets the content of these tweets is more varied and does actually contain text other than hashtags and URLs. This account appears to draw attention to press stories (and other web resources) that would likely be of intterest to participanrs and followers of the occupy movement.

Reproducibility

Reproducing my exact results when working through this code will not be possible. By the time the reader enters the code incorporated here, the tweets obtained via the Twitter API will be more recent than those in the example - and may not, for example, contain anything from leofroth. However, for anyone interested in producing my exact results, I have saved the variables tweets, texts and tweeter that I obtained in an R data file and placed this in the public folder of DropBox. By entering the following R commands, it is possible to obtain my exact results.

# Download my binary file from the public DropBox folder
download.file("http://dl.dropbox.com/u/7013997/tweets.Rdata", "tweets.RData")
# Read it in
load("tweets.RData")
# Re-compute the frevquency count of the tweeter
tweeter.freq <- table(tweeter)

Entering these commands and then re-entering the exercises starting from the ''Create a Word Cloud …'' section should provide results identical to those on this web site.

Not Everyone Likes Word Clouds

Drew Conway is a well known commentator on data analysis and visualisation, and has extensive experience in producing R code to tackle a wide set of problems. He is not particularly impressed with word clouds as a visualtion tool: in particular describing them as 'Spatial visualization wherein space is meaningless'.However I would argue that although the spatial location of a word in a word cloud conveys no meaning, they are still an effective tool. My argument is that although there is really only one dimension of information (the frequency with which each word occurs) this visualisation allows that information to be displayed in a compact form - occupying a two-dimensional region with an aspect ratio close to one.

For example, a more conventional representation of the screen name counts might be via a bar plot:

par(mar = c(4, 8, 2, 4) + 0.1)
barplot(sort(tweeter.freq), horiz = TRUE, las = 2, col = "tomato", 
    xlab = "Tweet Tally")
title("Number of Tweets by Screen Name")

plot of chunk unnamed-chunk-1

although this arguably conveys more precise information (in particular the exact order and values of the counts) it is relatively difficult to read, as it is over three times as tall as it is wide. Thus, one has to scroll down to see all of the information (or if the bar plot were transposed one would have to scroll across). Thus, although the location of the words in a word count doesn't convey any specific information, it is perhaps the use of two-dimensional space to pack the information efficiently that makes them effective.

I should also go on to point out that Drew's argument is particularly strong when considering the comparison of word content for two documents - and for this, his suggestions to improve word cloud representations are particularly impressive. These are set out in the article mentioned earlier.

Creating a Function

The code below combines all of the previous commands into a single function, tweetcloud which carries out the tweet search, and draws the wordcloud. This can then be used to create screen name wordclouds for tweeters of an arbitrary hashtag:

tweetcloud <- function(hashtag, ntweets = 250, scale = c(6, 0.5), 
    min.freq = 1) {
    tweets <- searchTwitter(hashtag, n = ntweets)
    l <- length(tweets)
    tweeter <- vector(mode = "character", length = l)
    for (i in 1:l) tweeter[i] <- tweets[[i]]$getScreenName()
    tweeter.freq <- table(tweeter)
    par(bg = "black")
    wordcloud(names(tweeter.freq), tweeter.freq, colors = c("tomato", "wheat", 
        "lightblue"), scale = scale, random.color = TRUE, rot.per = 0.5, min.freq = min.freq, 
        font = 2, family = "serif")
}

The above defines the function - here is an example based on tweeters using the hashtag #London2012 -

tweetcloud("#London2012")

plot of chunk olympics

As before, there is a need to declare a reproducability caveat - as I write this, it is around a week after the Olympics in London finished - and depending on when this tutorial is read, the results obtained may differ. It may be more interesting to investigate a more current hashtag (#paralympics, for example, if this is being read between 29th August – 9th September 2012).

Looking at Relationships Between Hashtags

Another way of analysing tweets is to identify the linkages between different hashtags that are included in the text. This requires a more sophisticated approach than above, which is essentially just a frequency count of screen names appearing. To identify linkages between hashtags, one possibility is to make use of graph theory. In particular this is useful for analysing which hashtags tend to appear together in the same tweet. This information can be represented as a graph - each hashtag appearing in the collection of tweets is represented by a graph vertex - and pairs of hashtags appearing in the same tweet are represented by a graph edge.

The igraph package provides a set of tools for handling, analysing and visualising graph data. The first task here is to create a hashtag graph as described above. In the code below a function tags is defined. Given the text string of a tweet this returns a list of all of the hashtags in this string as a string array. It also converts these to upper case - this is to simplify the subsequent analysis, by assuming that, for example '#OCCUPY', '#Occupy' and '#occupy' are essentially the same hashtag. This function is then applied to each tweet text, and stored in taglist - a list of lists of the same length as the number of tweets - taglist[[i]] contains all of the hashtags extracted from tags[i].

# define a tag extractor function
tags <- function(x) toupper(grep("^#", strsplit(x, " +")[[1]], value = TRUE))
# Create a list of the tag sets for each tweet
taglist <- vector(mode = "list", l)
# ... and populate it
for (i in 1:l) taglist[[i]] <- tags(texts[i])
# Now make a list of all unique hashtags
alltags <- NULL
for (i in 1:l) alltags <- union(alltags, taglist[[i]])

Once all of this code is run, the hashtag data and information about which hashtags appear in the same tweet will be represented. The next stage is to store this in an igraph graph object. Without going in to too much detail the code below does this. If you do want to find out more about how this works, explore the earlier igraph link.

# Create an empty graph
hash.graph <- graph.empty(directed = FALSE)
# Populate it with nodes
hash.graph <- hash.graph + vertices(alltags)
# Populate it with edges
for (tags in taglist) {
    if (length(tags) > 1) {
        for (pair in combn(length(tags), 2, simplify = FALSE, FUN = function(x) sort(tags[x]))) {
            if (pair[1] != pair[2]) {
                if (hash.graph[pair[1], pair[2]] == 0) 
                  hash.graph <- hash.graph + edge(pair[1], pair[2])
            }
        }
    }
}

The above code stores the hashtags and their co-appearances in a graph object called hash.graph. However, it is more informative to visualise this than simply store it internally. The next block of code does this. Graph objects in igraph have a number of properties that inform R of the way that they should be visualised. These can refer to the graph as a whole, or to vertices or edges. The first line below states that the layout algorithm used to determine where the graph vertices should be drawn is the Kamada-Kawei algorithm - this is essentially a force-directed algorithm, intended to present the vertices and edges in away that positions groups of vertices with many interconnections close to one another. igraph offers several alternative algorithms - it may be interesting to explore and experiment.

The rest of the code supplies graphical parameters for the graph object – when the left hand side of the <- is V(hash.graph) the parameters refer to vertices, and when it is E(hash.graph) they refer to edges. If the value assigned is a scalar (or more accurately in R a vector of length 1) then it applies to all edges or vertices. If it is a vector of values, these are assigned to each individual vertex or edge - for example V(hash.graph)$label <- V(hash.graph)$name extracts all of the names used to refer to the vertices by igraph and assigns them to the labels used for the vertices when the graph is plotted. When these various parameters are set (again more details are in the igraph documentation links) a command is issued to plot the graph. To make the most of this, you will need to make the R plotting window quitew large - there are quite a few hashtags and they are interconnected by a complicated set of edges. Note also you may get a few warning messages - on my laptop which is set to use a UK keyboard, there are some problems related to chinese characters - apologies for this - I will update this page if I find out how to fix this. An alternative way to view this might be to save it as a PDF or some other vector-format graphics file. Although the graph is complex, I find that viewing it on an iPad and zooming in to see detail is quite effective. Note that due to complexity of the graph, I wasn't able to include it directly on this web page - however here is a pdf…

hash.graph$layout <- layout.kamada.kawai
V(hash.graph)$color <- "tomato"
E(hash.graph)$color <- "black"
V(hash.graph)$label <- V(hash.graph)$name
V(hash.graph)$label.cex = 0.5
V(hash.graph)$size <- 20
V(hash.graph)$size2 <- 2
V(hash.graph)$shape <- "rectangle"
plot(hash.graph)

Some Further Analysis of Hashtags

The above graph is fairly large and complex - and possibly a more compact, synoptic view might be helpful in giving an at-a-glance view of the key hashtags. Turning again to graph theory, and the graph-based representation of hashtags coincident in the tweets, measures of centrality may be useful here. Hashtags that are more 'central' in the above graph might be thought of as representing key themes in the collection of tweet texts, since they are mentioned in the context of a large number of other themes.

There are a number of ways of measuring centrality of graph vertices. One possibility is the betweenness index [1]. If edges on a graph are thought of as paths of length one between pairs of vertices, this index for a vertex $ v $ is defined by

\[ g(v) = \sum_{i \ne v \ne j} \frac{\sigma_{ivj}}{\sigma_{ij}} \]

where $ g(v) $ is the betweenness index, $ \sigma_{ij} $ is the number of shortest paths linking vertices $ i $ and $ j $ and $ \sigma_{ivj} $ is the number of shortest paths linking vertices $ i $ and $ j $ that pass through $ v $. Note due to the discrete nature of path length described above, there will be frequent ties for the shortest path connecting a pair of vertices, so quite often $ \sigma_{ij} $ and $ \sigma_{ivj} $ may be greater than one.

The idea here is that the length of the shortest path between a pair of vertices represents a form of separation. For hashtags it is a kind of conceptual separation - a hashtag is thought to associate another one if they are included in the same tweet. A second order association occurs if they are not cited in the same tweet, but there is a hashtag that is shared in a tweet with one of them, and shared in a different tweet with the other. This corresponds to the shortest path between the hashtag being of length two. Similarly, higher order linkages can be found. $ g(V) $ measures the tendency of any particular vertex to appear in shortests paths - or, put another way, its centrality to the conceptual linkage of hashtags.

Fortunately, the igraph package provides a function to compute this measure of centrality. In the following code this is computed, and a wordcloud of hashtags is drawn, in which the size of the word in the cloud corresponds to its centrality. In the code, centrality is computed, but some filtering is done. Firstly, the most central vertex is dropped. Since the original data was based on a search through the Twitter API to find tweets containing a specific hashtag (in this case, recall it was '#occupy'), this will always appear, and be central in the way described above. However, since this centrality is a consequence of the search parameters it offers little in the way of revelation or discovery to represent it in a visualisation. Also, there are a number of hashtags that have a betweenness index of zero, since they do not feature in the shortest path beteween any pair of other vertices. This suggests they are relatively peripheral. Finally, experiment show that a square root transform was useful here, as using 'raw' betweenness indices resulted in a very small number of hashtags dominating the cloud.

p.rank <- sort(betweenness(hash.graph, directed = TRUE), decreasing = TRUE)[-1]
p.rank <- sqrt(p.rank[p.rank > 0])
par(bg = "black")
wordcloud(names(p.rank), p.rank, colors = c("tomato", "wheat", "lightblue"), 
    scale = c(7, 0.5), random.color = TRUE, rot.per = 0.5, min.freq = 1, font = 2, 
    family = "serif")

plot of chunk pagerank

The cloud above gives some indication of key hashtags in the collection of tweets. In particular, using a resource such as tagdef it can be seen that with my original data relating to '#occupy' key hashtags relate to the occupy movement itself (eg '#ows' - Occupy Wall Street; '#oo' - Occupy Oakland) and a number of progressive liberal tags ('#connectheleft', '#ctl') as well as tags relating to right wing groupings ('#tlot - Top Libertarians on Twitter; '#teaparty' and so on ). In addition to these, and possibly relating to a story currently in the news about the imprisonment of the Moscow-based feminist punk group Pussy Riot other central tags are '#putin', '#freepussyriot', '#pussyriot' and '#occupymoscow'.

As said a number of times, it is worth experimenting with variants on this analysis - for example instead of using betweenness indices to measure centrality, igraph offers several alternatives, such as the pageRank score - this may be a useful way to test the robustness of the above analysis, if different concepts of centrality replace the one discussed.

[1]: Freeman, Linton (1977). “A set of measures of centrality based upon betweenness”. Sociometry 40: 35–41.