For this project, the social media site of interest will be Twitter. Also, demonstrate how to download data from twitter via the API and create a visualization using the data using specific twitter handles.
Will attempt to illustrate the vertices/nodes represent users and edges represent the relationships between the nodes. Develop a graph where different kinds of nodes and different kinds of edges. The visualization would have nodes representing tweets, with edges to the users who created them, favorited them, re-tweeted them, etc. The network of tweets will be modeled using various mining and mapping packages in R.
Twitter is a popular social media site where many popular personalities from various strata of the society post their opinions and comments in a 140 character limit post. For this project we will focus on a key topic like “climatechange” by extracting a sample set of tweets that contain this as a hashtag or text.
Social Network Analysis has its origins in both social science and in the broader fields of network analysis and graph theory
Network analysis concerns itself with the formulation and solution of problems that have a network structure; such structure is usually captured in a graph (see the circled structure to the right)
Graph theory provides a set of abstract concepts and methods for the analysis of graphs. These, in combination with other analytical tools and with methods developed specifically for the visualization and analysis of social (and other) networks, form the basis of what we call social network analysis methods.
But SNA is not just a methodology; it is a unique perspective on how society functions. Instead of focusing on individuals and their attributes, or on macroscopic social structures, it centers on relations between individuals, groups, or social institutions
Un-Directed Network
Directed Network
The OAuth keys obtained by following the above instructions are used here for direct authentication with Twitter before we start pulling the tweets.
consumer_key <- 'Keg36lsjuevCizGh6NSPDQBlO'
consumer_secret <- '8qD8Trt1f8vYf2pM04gKJdARXKLrToEYApKEvbl9XcNCOQpWmk'
access_token <- '332850925-EXIQbHGC58SFdpOGjZiCUxtWigXU5yB3C8TRuvJ8'
access_secret <- 'PrqhgA2AEImOwMg8W0iXTz6KIJKRCty9qB1NbZrOJcIpA'
# This function wraps the OAuth authentication handshake functions
setup_twitter_oauth(consumer_key,
consumer_secret,
access_token,
access_secret)
## [1] "Using direct authentication"
To convert data into an igraph network, we will use the graph_from_data_frame() function, which takes two data frames: d and vertices.
d describes the edges of the network. Its first two columns are the IDs of the source and the target node for each edge. The following columns are edge attributes (weight, type, label, or anything else).
vertices starts with a column of node IDs. Any following columns are interpreted as node attributes.
## IGRAPH DN-- 31 24 --
## + attr: name (v/c), favoriteCount (v/n), favoriteCount (e/n)
## + edges (vertex names):
## [1] SDGscameroon ->WinnieKamau254 SDGscameroon ->ElongWilliam
## [3] SDGscameroon ->GrantLeonagrant clsgeography ->UNDESA
## [5] Peter_Hindwood ->Blurred_Trees Peter_Hindwood ->AlexEpstein
## [7] Peter_Hindwood ->Kathleen_Wynne Peter_Hindwood ->ec_minister
## [9] Peter_Hindwood ->Glen4ONT Peter_Hindwood ->EcoSenseNow
## [11] calstark87 ->saul42 paulramsaybamff->Alex_Verbeek
## [13] ZEROCO2_ ->ZEROCO2_ kbelesova ->LancetCountdown
## [15] NRGrenaissance ->Agropark NRGrenaissance ->politicianslie
## + ... omitted several edges
The description of an igraph object starts with four letters:
D or U, for a directed or undirected graph N for a named graph (where nodes have a name attribute) W for a weighted graph (where edges have a weight attribute) B for a bipartite (two-mode) graph (where nodes have a type attribute)
The two numbers that follow (36 23) refer to the number of nodes and edges in the graph. The description also lists node & edge attributes, for example:
(g/c) - graph-level character attribute (v/c) - vertex-level character attribute (e/n) - edge-level numeric attribute
We also have easy access to nodes, edges, and their attributes with:
## + 24/24 edges (vertex names):
## [1] SDGscameroon ->WinnieKamau254 SDGscameroon ->ElongWilliam
## [3] SDGscameroon ->GrantLeonagrant clsgeography ->UNDESA
## [5] Peter_Hindwood ->Blurred_Trees Peter_Hindwood ->AlexEpstein
## [7] Peter_Hindwood ->Kathleen_Wynne Peter_Hindwood ->ec_minister
## [9] Peter_Hindwood ->Glen4ONT Peter_Hindwood ->EcoSenseNow
## [11] calstark87 ->saul42 paulramsaybamff->Alex_Verbeek
## [13] ZEROCO2_ ->ZEROCO2_ kbelesova ->LancetCountdown
## [15] NRGrenaissance ->Agropark NRGrenaissance ->politicianslie
## [17] NRGrenaissance ->thegreenpagesBC NRGrenaissance ->ThatArcher
## [19] NRGrenaissance ->UNSDI_NCO NRGrenaissance ->BrianDColwell
## + ... omitted several edges
## + 31/31 vertices, named:
## [1] Agropark Alex_Verbeek AlexEpstein Blurred_Trees
## [5] boyz2menentgh BrianDColwell calstark87 ClimateTreaty
## [9] clsgeography ec_minister EcoSenseNow ElongWilliam
## [13] Glen4ONT GrantLeonagrant guardianeco Kathleen_Wynne
## [17] kbelesova LancetCountdown m_kaczerowski NRGrenaissance
## [21] paulramsaybamff Peter_Hindwood politicianslie saul42
## [25] SDGscameroon ThatArcher thegreenpagesBC UNDESA
## [29] UNSDI_NCO WinnieKamau254 ZEROCO2_
It is also easy to extract an edge list or matrix back from the igraph network:
## [,1] [,2]
## [1,] "SDGscameroon" "WinnieKamau254"
## [2,] "SDGscameroon" "ElongWilliam"
## [3,] "SDGscameroon" "GrantLeonagrant"
## [4,] "clsgeography" "UNDESA"
## [5,] "Peter_Hindwood" "Blurred_Trees"
## [6,] "Peter_Hindwood" "AlexEpstein"
## [7,] "Peter_Hindwood" "Kathleen_Wynne"
## [8,] "Peter_Hindwood" "ec_minister"
## [9,] "Peter_Hindwood" "Glen4ONT"
## [10,] "Peter_Hindwood" "EcoSenseNow"
## [11,] "calstark87" "saul42"
## [12,] "paulramsaybamff" "Alex_Verbeek"
## [13,] "ZEROCO2_" "ZEROCO2_"
## [14,] "kbelesova" "LancetCountdown"
## [15,] "NRGrenaissance" "Agropark"
## [16,] "NRGrenaissance" "politicianslie"
## [17,] "NRGrenaissance" "thegreenpagesBC"
## [18,] "NRGrenaissance" "ThatArcher"
## [19,] "NRGrenaissance" "UNSDI_NCO"
## [20,] "NRGrenaissance" "BrianDColwell"
## [21,] "NRGrenaissance" "m_kaczerowski"
## [22,] "NRGrenaissance" "guardianeco"
## [23,] "NRGrenaissance" "ClimateTreaty"
## [24,] "boyz2menentgh" "Alex_Verbeek"
## 31 x 31 sparse Matrix of class "dgCMatrix"
##
## Agropark . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Alex_Verbeek . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AlexEpstein . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Blurred_Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## boyz2menentgh . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## BrianDColwell . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## calstark87 . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . .
## ClimateTreaty . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## clsgeography . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 .
## ec_minister . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## EcoSenseNow . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## ElongWilliam . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Glen4ONT . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## GrantLeonagrant . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## guardianeco . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Kathleen_Wynne . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## kbelesova . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . .
## LancetCountdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## m_kaczerowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## NRGrenaissance 1 . . . . 1 . 1 . . . . . . 1 . . . 1 . . . 1 . . 1 1 . 1
## paulramsaybamff . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Peter_Hindwood . . 1 1 . . . . . 1 1 . 1 . . 1 . . . . . . . . . . . . .
## politicianslie . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## saul42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## SDGscameroon . . . . . . . . . . . 1 . 1 . . . . . . . . . . . . . . .
## ThatArcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## thegreenpagesBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## UNDESA . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## UNSDI_NCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## WinnieKamau254 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## ZEROCO2_ . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##
## Agropark . .
## Alex_Verbeek . .
## AlexEpstein . .
## Blurred_Trees . .
## boyz2menentgh . .
## BrianDColwell . .
## calstark87 . .
## ClimateTreaty . .
## clsgeography . .
## ec_minister . .
## EcoSenseNow . .
## ElongWilliam . .
## Glen4ONT . .
## GrantLeonagrant . .
## guardianeco . .
## Kathleen_Wynne . .
## kbelesova . .
## LancetCountdown . .
## m_kaczerowski . .
## NRGrenaissance . .
## paulramsaybamff . .
## Peter_Hindwood . .
## politicianslie . .
## saul42 . .
## SDGscameroon 1 .
## ThatArcher . .
## thegreenpagesBC . .
## UNDESA . .
## UNSDI_NCO . .
## WinnieKamau254 . .
## ZEROCO2_ . 1
Now that we have our igraph network object, let’s make a first attempt to plot it.
# removing the loops in the graph.
g <- simplify(g, remove.multiple = F, remove.loops = T)
# reduce the arrow size
plot(g, edge.arrow.size=.1, edge.color="blue",
vertex.color="orange", vertex.frame.color="#ffffff",
vertex.label=V(g)$name, vertex.label.color="black")
# simpleNetwork from networkD3 package creates simple D3 JavaScript force directed network graphs.
networkD3::simpleNetwork(edges, Source = "source", Target = "target")
# scale vertices color to # of favorites
pal <- RColorBrewer:::brewer.pal(5, "Dark2")
V(g)$color <- colorRampPalette(pal)(length(V(g)$favoriteCount))
# plot vertex color ~ degree
plot(g, vertex.color = V(g)$color, edge.label = degree(g), edge.arrow.mode = 1, edge.arrow.size=.2)
Communities are groups of nodes in a graph that are highly connected, indicated that they ‘belong’ together for some reason.
However, in graphs, communities can overlap, where nodes can belong to more than one group. In this case, Elon Musk is followed by a lot of people in the DS community, even though one wouldn’t consider him to be a member.
# get communities
cm <- walktrap.community(g)
plot(g, vertex.color = membership(cm), edge.arrow.size=.1, edge.arrow.mode = 3, vertex.size = log1p(degree(g)) * 8)
This was a quick approach to represent a specific topic or hashtag from Twitter in a network model with users and their relationships represented by nodes and edges in a graph.
A small sample (n=20) of tweets were extracted from Twitter using R and token keys. As we increase the sample size the plots and nodes become much complex and not easily readable.
With 36 nodes and 23 edges, the graphs represented multiple clusters of nodes (as shown in the previous slides) with direction and named attributes like name and favorite count.
The centrality measures like degree do convey the strength of the relationships represented by the number on the edge. The bigger the number, greater is the relationship between the nodes.
Further advanced analysis is needed to arrive at deeper and more detailed insights in to the user activity, influence capability and relationship strengths.