The goal of the twinetverse is to provide everything one might need to view Twitter interactions, from data collection to visualisation. This could be a powerful tool for social media analysis, since it could help visualizing how users communicate with one another on a given topic or how information spreads throughout the Twitter network.
On this article, we’re going to briefly explore the twinetverse, with creating a graph that link users to the users they retweet to fundamentally visualise how information spreads throughout Twitter.
The twinetverse package is available on Github
# install.packages("devtools")
devtools::install_github("JohnCoene/twinetverse") # githubThe twinetverse includes three packages:
rtweet (Kearney 2018) : wraps the Twitter API, thereby giving R users easy access to tweets. fills the very first step in visualising Twitter interactions.graphTweets (Coene 2019a) : extract nodes and edges from tweets collected with rtweet, fills the second step in visualising Twitter interactions, building the graphs from the collected data.sigmajs (Coene 2019b) : visualise the networks we have built using graphTweets, is the last piece of the puzzle, visualising the graphs we have built.Within the context of visualising Twitter interactions, each of the packages listed above fill in a specific need and a distinct step of the process, 1) collecting the data, 2) building the graphs and finally 3) visualising the graphs of said interactions.
The packages are pipe ( %>% ) friendly, therefore making it easy to go from building a graph to visualising it.
library(twinetverse)## -- Attaching twinetverse -------------------------------------------------------------------- twinetverse 0.0.2 --
library(tidyverse)You’re now setup with an app, take note of the crendentials of your app under “Keys and Access Tokens”, as you will need it to create your token and fetch tweets:
mytoken <- create_token(
"My Application Name",
consumer_key = "XxxxXxXXxXx",
consumer_secret = "XxxxXxXXxXx",
access_token = "XxxxXxXXxXx",
access_secret = "XxxxXxXXxXx"
)Ideally, also save it. There is no need to re-create a token everytime you want to download data.
saveRDS(mytoken, file = "mytoken.rds")There are several types of graphs that the twinetverse, through graphTweets, allows us to build. On this article, our focus will be on the Retweets type, in which will help us understand how information spreads throughout the Twitter network.
We’ll start with collecting our tweets. I’m gonna use the hashtag #TheyAreUs, which was trending on Twitter nowadays after the the Christchurch twin mosque shootings, as our example.
# export API token
mytoken <- readRDS("mytoken.rds")
tweets <- search_tweets("#TheyAreUs filter:retweets", n = 1000, include_rts = TRUE)## Searching for tweets...
## Finished collecting tweets!
Note:
If you want to skip the API authorization process and prefer to practice on existing twitter data, you can also export the twitter data csv on this directory:
# tweets <- read_csv("data_input/tweets.csv")The search_tweets function takes a few arguments. Above, we fetch 1000 tweets about “#TheyAreUs”, and since we want to focus on re-tweets, we also ensured the tweets we collect include re-tweets.
Each row a is a tweet, rtweet returns quite a lot of variables (88), we’ll only look at a select few.
names(tweets)## [1] "user_id" "status_id"
## [3] "created_at" "screen_name"
## [5] "text" "source"
## [7] "display_text_width" "reply_to_status_id"
## [9] "reply_to_user_id" "reply_to_screen_name"
## [11] "is_quote" "is_retweet"
## [13] "favorite_count" "retweet_count"
## [15] "hashtags" "symbols"
## [17] "urls_url" "urls_t.co"
## [19] "urls_expanded_url" "media_url"
## [21] "media_t.co" "media_expanded_url"
## [23] "media_type" "ext_media_url"
## [25] "ext_media_t.co" "ext_media_expanded_url"
## [27] "ext_media_type" "mentions_user_id"
## [29] "mentions_screen_name" "lang"
## [31] "quoted_status_id" "quoted_text"
## [33] "quoted_created_at" "quoted_source"
## [35] "quoted_favorite_count" "quoted_retweet_count"
## [37] "quoted_user_id" "quoted_screen_name"
## [39] "quoted_name" "quoted_followers_count"
## [41] "quoted_friends_count" "quoted_statuses_count"
## [43] "quoted_location" "quoted_description"
## [45] "quoted_verified" "retweet_status_id"
## [47] "retweet_text" "retweet_created_at"
## [49] "retweet_source" "retweet_favorite_count"
## [51] "retweet_retweet_count" "retweet_user_id"
## [53] "retweet_screen_name" "retweet_name"
## [55] "retweet_followers_count" "retweet_friends_count"
## [57] "retweet_statuses_count" "retweet_location"
## [59] "retweet_description" "retweet_verified"
## [61] "place_url" "place_name"
## [63] "place_full_name" "place_type"
## [65] "country" "country_code"
## [67] "geo_coords" "coords_coords"
## [69] "bbox_coords" "status_url"
## [71] "name" "location"
## [73] "description" "url"
## [75] "protected" "followers_count"
## [77] "friends_count" "listed_count"
## [79] "statuses_count" "favourites_count"
## [81] "account_created_at" "verified"
## [83] "profile_url" "profile_expanded_url"
## [85] "account_lang" "profile_banner_url"
## [87] "profile_background_url" "profile_image_url"
A network consists of nodes and edges: this is just what graphTweets returns.
In this graph, each node is a user who is connected to other users who he/she retweeted. Functions in graphTweets are meant to be run in a specific order:
net <- tweets %>%
gt_edges(source = screen_name, target = retweet_screen_name) %>% # get edges
gt_nodes() # get nodesWe called gt_edges on our tweets data frame, passing a few bare column names. The source of the tweets (the user posting the tweets) will also be the source of our edges so we pass source = screen_name, then the target of these edges will be users whom they retweeted, which is given by the API as retweet_screen_name; this will be target of our edges.
The object returned is of an unfamiliar class.
class(net)## [1] "graphTweets"
To extracts the results from graphTweets run gt_collect, this will work at any point in the chain of pipes (%>%).
net <- net %>%
gt_collect()
class(net)## [1] "list"
We can visualise the network with sigmajs. Then again, it’s very easy and follows the same idea as graphTweets; we pipe our nodes and edges through. Before we do so, for the sake of clarity, let’s unpack our network using the %<-% from the Zeallot package (Teetor 2018), imported by the twinetverse.
c(edges, nodes) %<-% netNote: You can always unpack the network with edges <- net$edges and nodes <- net$nodes if you are not comfortable with the above.
Let’s take a look at the edges.
head(edges)## # A tibble: 6 x 3
## source target n
## <chr> <chr> <int>
## 1 __choeeey mmmmaggy 1
## 2 _denchtastic hurricanesrugby 1
## 3 03_brookey nzwarriors 1
## 4 03_brookey seaeagles 2
## 5 1cuteone intactive 1
## 6 1ftam1 fahimaq 1
Edges simply consist of source and target, as explained earlier on, source essentially corresponds to screen_name passed in gt_edges, it is the user who posted the tweet. In contrast, target includes includes the users whom they retweeted on that tweet. The n variable indicates how many tweets connect the source to the target.
Now let’s take a look at the nodes:
head(nodes)## # A tibble: 6 x 3
## nodes type n
## <chr> <chr> <int>
## 1 __choeeey user 1
## 2 __interfaith__ user 7
## 3 _denchtastic user 1
## 4 _lucywest user 5
## 5 03_brookey user 2
## 6 1cuteone user 1
In the nodes data frame, the column n is the number of times the node appears (whether as source or as target), while the nodes column are the Twitter handles of both the authors of the tweets and those who retweeted the tweets.
Below we rename a few columns, to meet sigmajs naming convention.
nodes column.n to size as this is what sigmajs understands.sigmajs requires each edge to have a unique id.sigmajs has a specific but sensible naming convention as well as basic minimal requirements:
id, and size.id, source, and target.Now, the twinetverse comes with helper functions to prepare the nodes and edges build from graphTweets for use in sigmajs (these are the only functions the ’verse provides).
nodes <- nodes2sg(nodes)
edges <- edges2sg(edges)Let’s visualise that, we must initialise every sigmajs graph with the sigmajs function, then we add our nodes with sg_nodes, passing the column names we mentioned previously, id, and size to meet sigmajs’ minimum requirements.In sigmajs, at the exception of the function called sigmajs, all start with sg_
sigmajs actually allows us to build graphs using only nodes or edges. Contrary to graphTweets rules, we have to run sigmajs functions in the correct order; first the nodes, then the edges.
Let’s begin with map our nodes:
sigmajs() %>%
sg_nodes(nodes, id, size) Then, let’s add the edges:
sigmajs() %>%
sg_nodes(nodes, id, size) %>%
sg_edges(edges, id, source, target)Each disk/point on the graph is a twitter user, they are connected when one has retweeted the other in their tweet.
Now above graph doesn’t look really informative, but sigmajs is highly customisable. We’re going to beautify that a bit, starting with add appropriate layout to the graph. The layout we’re going to use on the following code is taken from one of igraph’s layout algorithms.
We’ll also add labels that will display on hover by simply passing the label column to sg_nodes.
sigmajs() %>%
sg_nodes(nodes, id, label, size) %>%
sg_edges(edges, id, source, target) %>%
sg_layout(layout = igraph::layout_components)Looks a lot better, isn’t it? Next, we color the nodes by cluster with sg_cluster
sigmajs() %>%
sg_nodes(nodes, id, label, size) %>%
sg_edges(edges, id, source, target) %>%
sg_layout(layout = igraph::layout_components) %>%
sg_cluster(
colors = c(
"#60dd8e",
"#3f9f7f",
"#188a8d",
"#17577e",
"#141163"
)
) %>%
sg_settings(
minNodeSize = 1,
maxNodeSize = 2.5,
edgeColor = "default",
defaultEdgeColor = "#d3d3d3"
)From above visualisation, we can learn about each cluster of “interactions” and how a certain user be the highest influence among #TheyAreUs campaign.
We’ve been visualising Twitter interactions in a static manner, but they are dynamic when you think of it. Twitter conversations happen over time, thus far, we’ve just been drawing all encompassing snapshots. So let’s take into account the time factor to make a where the edges appear at different time steps.
Let’s use the same tweets data:
tweets <- read_csv("data_input/tweets.csv")## Parsed with column specification:
## cols(
## .default = col_character(),
## created_at = col_datetime(format = ""),
## display_text_width = col_double(),
## reply_to_status_id = col_logical(),
## reply_to_user_id = col_logical(),
## reply_to_screen_name = col_logical(),
## is_quote = col_logical(),
## is_retweet = col_logical(),
## favorite_count = col_double(),
## retweet_count = col_double(),
## symbols = col_logical(),
## ext_media_type = col_logical(),
## quoted_status_id = col_logical(),
## quoted_text = col_logical(),
## quoted_created_at = col_logical(),
## quoted_source = col_logical(),
## quoted_favorite_count = col_logical(),
## quoted_retweet_count = col_logical(),
## quoted_user_id = col_logical(),
## quoted_screen_name = col_logical(),
## quoted_name = col_logical()
## # ... with 27 more columns
## )
## See spec(...) for full column specifications.
Now onto building the graph.
net <- tweets %>%
gt_edges(screen_name, mentions_screen_name, created_at) %>%
gt_nodes() %>%
gt_dyn() %>%
gt_collect()Quite a few things differ from previous graphs we have built.
created_at in gt_edges. This in effect adds the created_at column to our edges, so that we know the created time of post in which the edge appears.gt_dyn which stands for dynamic, to essentially compute the time at which edges and nodes should appear on the graph.Like what we’ve done earlier, first we need to unpack both edges and nodes:
c(edges, nodes) %<-% net # unpack
nodes <- nodes2sg(nodes)Notice that after we unpacked them, we have only prepared our nodes for the sigmajs visualisation. This is because we have to perform another preparation to our edges for it to be dynamically appear on the graph.
The way this works in sigmajs is by specifying the delay in milliseconds before each respective edge should be added. Therefore, we need to transform the date to milliseconds and rescale them to be within a reasonable range: we don’t want the edges to actually take 15 hours to appear on the graph.
1. We change the date time column (POSIXct actually) to a numeric, which gives the number of milliseconds.
2. We rescale between 0 and 1 then multiply by 10,000 (milliseconds) so that the edges are added over 10 seconds.
edges <- edges %>%
mutate(
id = 1:n(),
created_at = as.numeric(created_at),
created_at = (created_at - min(created_at)) / (max(created_at) - min(created_at)),
created_at = created_at * 10000
) %>%
select(id, source, target, created_at)Now, the actual visualisation, as mentioned at the begining to the chapter, we’ll plot the nodes then add edges dynamically. Let’s break it down step by step.
First, we plot the nodes.
sigmajs() %>%
sg_nodes(nodes, id, size, label) We’ll add the layout as it looks a bit messy with nodes randomly scattered across the canvas. We’ll have to compute the layout differently this time, we cannot simply use sg_layout as it requires both nodes and edges and we only have nodes on the graph (since edges are to be added later on, dynamically); instead we use sg_get_layout.
nodes <- sg_get_layout(nodes, edges, layout = igraph::layout_components)## Warning in if (class(newval) == "factor") {: the condition has length > 1
## and only the first element will be used
## Warning in if (class(newval) == "factor") {: the condition has length > 1
## and only the first element will be used
head(nodes)## id label start end
## 1 __choeeey __choeeey 2019-03-24 00:55:35 2019-03-28 15:50:45
## 2 __interfaith__ __interfaith__ 2019-03-27 11:47:53 2019-03-28 15:50:45
## 3 _19bm _19bm 2019-03-23 07:37:26 2019-03-28 15:50:45
## 4 _alleiahmalik _alleiahmalik 2019-03-23 13:43:37 2019-03-28 15:50:45
## 5 _denchtastic _denchtastic 2019-03-24 08:31:10 2019-03-28 15:50:45
## 6 10dubai 10dubai 2019-03-23 13:32:56 2019-03-28 15:50:45
## type size x y
## 1 user 1 52.6030416 80.18472
## 2 user 7 -77.3236099 -110.40540
## 3 user 1 -91.6700166 62.49215
## 4 user 1 96.7469803 -12.12597
## 5 user 1 -83.9489479 56.87001
## 6 user 1 0.7785031 -38.10881
Notice that sg_get_layoutcomputes the coordinates of the nodes (x and y) and adds them to our nodes dataframe.
Now we can simply pass the coordinates x and y to sg_nodes.
sigmajs() %>%
sg_nodes(nodes, id, size, label, x, y) Now we have something that looks like a graph, except it’s missing edges. Let’s add them.
We add the edges almost exactly as we did before, we use sg_add_edges instead of sg_edges. Other than the function name, the only difference is that we pass created_at as delay. We also set cumsum=FALSE, otherwise the function computes the cumulative sum on the delay, which is, here, our created_at column, and does not require counting the cumulative sum.
sigmajs() %>%
sg_nodes(nodes, id, size, label, x, y) %>%
sg_add_edges(edges, created_at, id, source, target, cumsum = FALSE, refresh=TRUE) Now the edges appear dynamically. However, as the animation is triggered when the page is loaded, sigmajs provides an easy workaround: we can add a button for the user to trigger the animation themself.
The button is added with sg_button to which we pass a label (Add edges) and the event (add_edges) the button will trigger. The name of the event corresponds to the function it essentially triggers minus the starting sg_. In our case add_edges triggers sg_add_edges. Many events can be triggered by the button, they are listed on sigmajs official website.
Lastly, to make our graph more pleasant to look at, we’ll add colors to our nodes and edges through sg_settings.
sigmajs() %>%
sg_nodes(nodes, id, size, label, x, y) %>%
sg_add_edges(edges, created_at, id, source, target, cumsum = FALSE, refresh = TRUE) %>%
sg_button("add_edges", "Add edges") %>%
sg_settings(
defaultNodeColor = "#127ba3",
edgeColor = "default",
defaultEdgeColor = "#d3d3d3",
minNodeSize = 1,
maxNodeSize = 4,
minEdgeSize = 0.3,
maxEdgeSize = 0.3
)Now even more intersting, from our dynamic graph, we can see which user spread the campaign earliest than the others.