Load required packages:
library(tidyverse)
[30m── [1mAttaching packages[22m ────────────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mggplot2[30m 3.1.0 [32m✔[30m [34mpurrr [30m 0.3.0
[32m✔[30m [34mtibble [30m 2.0.1 [32m✔[30m [34mdplyr [30m 0.8.0.[31m1[30m
[32m✔[30m [34mtidyr [30m 0.8.2 [32m✔[30m [34mstringr[30m 1.4.0
[32m✔[30m [34mreadr [30m 1.3.1 [32m✔[30m [34mforcats[30m 0.4.0 [39m
[30m── [1mConflicts[22m ───────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(DT)
library(igraph)
Attaching package: ‘igraph’
The following objects are masked from ‘package:dplyr’:
as_data_frame, groups, union
The following objects are masked from ‘package:purrr’:
compose, simplify
The following object is masked from ‘package:tidyr’:
crossing
The following object is masked from ‘package:tibble’:
as_data_frame
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
library(rtweet)
Attaching package: ‘rtweet’
The following object is masked from ‘package:purrr’:
flatten
library(visNetwork)
library(graphTweets) # This package turns twitter data into edge & node data
help('graphTweets') for examples
Attaching package: ‘graphTweets’
The following object is masked from ‘package:igraph’:
%<-%
This next chunk below is for authenticating rtweet so you can get tweets. You may not need to do this, because you may be authenticated from when we used rtweet previously. But if you do use this, be sure to either delete it or remove your passwords prior to saving, so no one else will see it.
token <- create_token(
app = "graphtweetschad",
consumer_key = "6BRO74AsbbonJz05ZckV8mGxe",
consumer_secret = "a5D6UM4k75g7p2pXil8LlgIGPBUAMp6aJcPsgF36c8uJyg680Z",
access_token = "1226201887478706176-jFlIApMTwwVSiVjWWELPfPD7t1nkFC",
access_secret = "bGIhXZ9aXjdxGMe4xbIEBdXkE3pywoZeRkNeYBJ65I0Z4")
get_token() # this shows the token. make sure key is the same as consumer_key above
<Token>
<oauth_endpoint>
request: https://api.twitter.com/oauth/request_token
authorize: https://api.twitter.com/oauth/authenticate
access: https://api.twitter.com/oauth/access_token
<oauth_app> graphtweetschad
key: 6BRO74AsbbonJz05ZckV8mGxe
secret: <hidden>
<credentials> oauth_token, oauth_token_secret
---
Twitter is a great example of a social network that we can examine with network analysis. There are several ways both to get tweets and to create networks.
Getting tweets: First, we’ll use the rtweet package’s search_tweets() to collect tweets to analyze. We can search for tweets about a topic or, as we’ll do below, even search in a geographical region. But a second and better approach is to select a group of users and use get_timeline() to get the tweets from just those people we want.
Creating the network: We could create our own node and edge data from the tweets, but there is a package called graphTweets that will do the dirty work for us. We give it the tweets, create the network, and then create the nodes and edges from the network. There are a variety of ways that people can connect on Twitter, and we can use any of them as edges: Retweets, mentions, replies, etc. We’ll focus on retweets and mentions.
Let’s start with an example of getting tweets that were created in Billings. This will only work on people who have turned on their geolocation. The geocode = “45.80,-108.55,25mi” gets tweets that were created within 25 miles of the latitude and longitude center of Billings.
bl_tweets <- search_tweets(geocode = "45.80,-108.55,25mi", n = 2000)
Downloading [===>-------------------------------------] 10%
Downloading [=====>-----------------------------------] 15%
Downloading [=======>---------------------------------] 20%
Downloading [=========>-------------------------------] 25%
Downloading [===========>-----------------------------] 30%
Downloading [=============>---------------------------] 35%
Downloading [===============>-------------------------] 40%
Downloading [=================>-----------------------] 45%
Downloading [===================>---------------------] 50%
Downloading [======================>------------------] 55%
Downloading [========================>----------------] 60%
Downloading [==========================>--------------] 65%
Downloading [============================>------------] 70%
Downloading [==============================>----------] 75%
Downloading [================================>--------] 80%
Downloading [==================================>------] 85%
Downloading [====================================>----] 90%
Downloading [======================================>--] 95%
Downloading [=========================================] 100%
Create the network of retweets.
The following long code chunk will create three data sets based on bl_tweets: 1. bl_retweet_network, which is the igraph network we can use to get network statistics 2. bl_retweet_nodes, which contains the people in the network 3. bl_retweet_edges, which are the connections - in this case, retweets - between the people
# This creates the network with some commands from graphTweets
# The variables in the parentheses of gt_edges() are used, so in this case we're using retweets
# We're also keeping text of the tweets
bl_retweet_network <- bl_tweets %>%
gt_edges(screen_name, retweet_screen_name, text) %>%
gt_graph()
# This next line gets the nodes
bl_retweet_nodes <- as_data_frame(bl_retweet_network, what = "vertices")
# This adds some additional info to the nodes, so we get the names on hover
# and the size of the node is based on its degree, etc.
bl_retweet_nodes <- bl_retweet_nodes %>%
mutate(id = name) %>%
mutate(label = name) %>%
mutate(title = name) %>%
mutate(degree = degree(bl_retweet_network)) %>%
mutate(value = degree)
# This gets the edges, similar to how we got the nodes above
bl_retweet_edges <- as_data_frame(bl_retweet_network, what = "edges")
# This puts the text of the tweet itself into the edge
# so when you hover over a line in the diagram it will show the tweet
bl_retweet_edges <- bl_retweet_edges %>%
mutate(title = text)
# Creates the diagram
visNetwork(bl_retweet_nodes, bl_retweet_edges, main = "Retweet network around Billings") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visEdges(arrows = "to")
The following will get the number of vertices (nodes) in the network.
bl_retweet_network %>%
vcount()
[1] 667
Find the number of edges (links) in the network using ecount:
bl_retweet_network %>%
ecount()
[1] 660
Starting with bl_retweet_edges, select only the from (the person who retweeted another), the to (the person who was retweeted), and the text of the tweet, and create a data table of the retweets below:
bl_retweet_edges %>%
select(from, to, text) %>%
datatable()
Get the density of the network with edge_density. The number will likely be very small because most of these people aren’t connected to one another:
bl_retweet_network %>%
edge_density()
[1] 0.001485744
One problem with the approach above is that most of the users are disconnected from one another. They happen to be in the same general location, but they don’t necessarily know or follow one another on twitter, so we have a very loosely connected network with very few connections.
A better approach is to find a group of individuals that you know are connected in some way.
I went to the Billings Gazette twitter page and got the twitter names of several of its reporters. The following code puts them into an object called bg_reporters, and then it uses get_timeline() to get the last 500 tweets from each of those reporters.
bg_reporters <- c("RobRogersBG",
"samalwilson",
"BGKord",
"BGmhoffman",
"DarrellEhrlick",
"BGMayer",
"bgSueOlp",
"hollykmichels",
"TomLutey",
"PhoebeTollefson")
bg_tweets <- get_timeline(bg_reporters, n = 500)
In addition to selecting users and using get_timeline() to collect tweets, there are two other differences in how we’re going to approach this.
In the code below, notice that we’re using mentions_screen_name in the gt_edges() function rather than retweet_screen_name as we did above. This is just a different type of interaction in twitter that we can use to create the network.
Also notice that in the first paragraph this line appears: filter(mentions_screen_name %in% bg_reporters). That makes sure that any mentions of people outside the network of Gazette reporters that we’re interested in are filtered out. We want to focus only on people in our group of reporters (I love the %in% function in R!).
# create network, including only members of our group
bg_mentions_network <- bg_tweets %>%
filter(mentions_screen_name %in% bg_reporters) %>%
gt_edges(screen_name, mentions_screen_name, text) %>%
gt_graph
#get nodes
bg_mentions_nodes <- as_data_frame(bg_mentions_network, what = "vertices")
# get edges
bg_mentions_edges <- as_data_frame(bg_mentions_network, what = "edges")
# add info to nodes
bg_mentions_nodes <- bg_mentions_nodes %>%
mutate(id = name) %>%
mutate(label = name) %>%
mutate(title = name) %>%
mutate(degree = degree(bg_mentions_network)) %>%
mutate(value = degree)
# add info to edges
bg_mentions_edges <- bg_mentions_edges %>%
mutate(title = text)
# create the network diagram
visNetwork(bg_mentions_nodes,
bg_mentions_edges,
main = "Billings Gazette Reporters Twitter Mentions Network") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visEdges(arrows = "to")
NA
Above we got the vcount, ecount, and edge_density. Here, let’s get the diameter length of the network. If you need to, go back to the previous assignment to see how to get that:
bg_mentions_network %>%
get_diameter() %>%
length()
[1] 3
bg_mentions_network %>%
edge_density()
[1] 6.055556
Finally, starting with bg_mentions_nodes, use select() to only look at the name and degree of the reporter, and make a data table of them so we can see who has the most and least connections with other reporters.
bg_mentions_nodes %>%
select(name, degree) %>%
datatable()
Above, we manually created a group of people to examine. That can get really tedious, and typos are a big possibility, especially if the group is large.
One alternative is to use a twitter list. These are lists of users that anyone can put together, and we can access them with rtweet. People make lists of celebrities, athletes, their own friends, people who work at the same company, and so on.
One group of lists comes from TwitterGov. Here are a few of the government lists they keep: “us-cabinet” (n = 41), “us-governors” (n = 50), “world-leaders” (n = 107), “us-senate” (n = 180). The senate list has 180 even though there are 100 senators because it includes personal accounts of the senators as well as their official senate accounts.
The following code gets the list of US cabinet members, which includes people in the government like Secretary of State, Education, Energy, etc.
cabinet <- lists_members(slug = "us-cabinet", owner_user = "TwitterGov")
cabinet_tweets <- get_timeline(cabinet$screen_name, n = 300)
# This creates the network with some commands from graphTweets
# The variables in the parentheses of gt_edges() are used, so in this case we're using retweets
cabinet_retweets_network <- cabinet_tweets %>%
filter(retweet_screen_name %in% cabinet$screen_name) %>% # <- This is a new line and important.
gt_edges(screen_name, retweet_screen_name, text) %>% # It only keep retweets of other cabinet members
gt_graph()
# This next line gets the nodes
cabinet_retweets_nodes <- as_data_frame(cabinet_retweets_network, what = "vertices")
# This adds some additional info to the nodes, so we get the names on hover
# and the size of the node is based on its degree, etc.
cabinet_retweets_nodes <- cabinet_retweets_nodes %>%
mutate(id = name) %>%
mutate(label = name) %>%
mutate(title = name) %>%
mutate(degree = degree(cabinet_retweets_network)) %>%
mutate(value = degree)
# This gets the edges, similar to how we got the nodes above
cabinet_retweets_edges <- as_data_frame(cabinet_retweets_network, what = "edges")
# This puts the text of the tweet itself into the edge
# so when you hover over a line in the diagram it will show the tweet
cabinet_retweets_edges <- cabinet_retweets_edges %>%
mutate(title = text)
# Creates the diagram
visNetwork(cabinet_retweets_nodes, cabinet_retweets_edges, main = "US cabinet officials retweet network") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visEdges(arrows = "to")
Now let’s find communities within the network and display them in the diagram.
cabinet_retweets_nodes <- cabinet_retweets_nodes %>%
mutate(group = membership(infomap.community(cabinet_retweets_network)))
Modularity is implemented for undirected graphs only.
visNetwork(cabinet_retweets_nodes, cabinet_retweets_edges, main = "US cabinet officials retweet network") %>%
visIgraphLayout(layout = "layout_nicely") %>%
visEdges(arrows = "to") %>%
visOptions(highlightNearest = T, nodesIdSelection = T, selectedBy = "group")
NA
NA