Introduction:

For this project, the social media site of interest will be Twitter. Also, demonstrate how to download data from twitter via the API and create a visualization using the data using specific twitter handles.

Will attempt to illustrate the vertices/nodes represent users and edges represent the relationships between the nodes. Develop a graph where different kinds of nodes and different kinds of edges. The visualization would have nodes representing tweets, with edges to the users who created them, favorited them, re-tweeted them, etc. The network of tweets will be modeled using various mining and mapping packages in R.

Twitter is a popular social media site where many popular personalities from various strata of the society post their opinions and comments in a 140 character limit post. For this project we will focus on a key topic like “climatechange” by extracting a sample set of tweets that contain this as a hashtag or text.

Network Analysis

Social Network Analysis has its origins in both social science and in the broader fields of network analysis and graph theory

Network analysis concerns itself with the formulation and solution of problems that have a network structure; such structure is usually captured in a graph (see the circled structure to the right)

Graph theory provides a set of abstract concepts and methods for the analysis of graphs. These, in combination with other analytical tools and with methods developed specifically for the visualization and analysis of social (and other) networks, form the basis of what we call social network analysis methods.

But SNA is not just a methodology; it is a unique perspective on how society functions. Instead of focusing on individuals and their attributes, or on macroscopic social structures, it centers on relations between individuals, groups, or social institutions

Relations as networks

Un-Directed Network

Un-Directed Network

Directed Network

Directed Network

Twitter connection - Keys and Tokens

The OAuth keys obtained by following the above instructions are used here for direct authentication with Twitter before we start pulling the tweets.

consumer_key <- 'Keg36lsjuevCizGh6NSPDQBlO'
consumer_secret <- '8qD8Trt1f8vYf2pM04gKJdARXKLrToEYApKEvbl9XcNCOQpWmk'
access_token <- '332850925-EXIQbHGC58SFdpOGjZiCUxtWigXU5yB3C8TRuvJ8'
access_secret <- 'PrqhgA2AEImOwMg8W0iXTz6KIJKRCty9qB1NbZrOJcIpA'

# This function wraps the OAuth authentication handshake functions
setup_twitter_oauth(consumer_key,
                    consumer_secret,
                    access_token,
                    access_secret)
## [1] "Using direct authentication"

Data Mining and Graphing

Dataset Edgelist

To convert data into an igraph network, we will use the graph_from_data_frame() function, which takes two data frames: d and vertices.

d describes the edges of the network. Its first two columns are the IDs of the source and the target node for each edge. The following columns are edge attributes (weight, type, label, or anything else).

vertices starts with a column of node IDs. Any following columns are interpreted as node attributes.

## IGRAPH DN-- 31 24 -- 
## + attr: name (v/c), favoriteCount (v/n), favoriteCount (e/n)
## + edges (vertex names):
##  [1] SDGscameroon   ->WinnieKamau254  SDGscameroon   ->ElongWilliam   
##  [3] SDGscameroon   ->GrantLeonagrant clsgeography   ->UNDESA         
##  [5] Peter_Hindwood ->Blurred_Trees   Peter_Hindwood ->AlexEpstein    
##  [7] Peter_Hindwood ->Kathleen_Wynne  Peter_Hindwood ->ec_minister    
##  [9] Peter_Hindwood ->Glen4ONT        Peter_Hindwood ->EcoSenseNow    
## [11] calstark87     ->saul42          paulramsaybamff->Alex_Verbeek   
## [13] ZEROCO2_       ->ZEROCO2_        kbelesova      ->LancetCountdown
## [15] NRGrenaissance ->Agropark        NRGrenaissance ->politicianslie 
## + ... omitted several edges

The description of an igraph object starts with four letters:

D or U, for a directed or undirected graph N for a named graph (where nodes have a name attribute) W for a weighted graph (where edges have a weight attribute) B for a bipartite (two-mode) graph (where nodes have a type attribute)

The two numbers that follow (36 23) refer to the number of nodes and edges in the graph. The description also lists node & edge attributes, for example:

(g/c) - graph-level character attribute (v/c) - vertex-level character attribute (e/n) - edge-level numeric attribute

We also have easy access to nodes, edges, and their attributes with:

## + 24/24 edges (vertex names):
##  [1] SDGscameroon   ->WinnieKamau254  SDGscameroon   ->ElongWilliam   
##  [3] SDGscameroon   ->GrantLeonagrant clsgeography   ->UNDESA         
##  [5] Peter_Hindwood ->Blurred_Trees   Peter_Hindwood ->AlexEpstein    
##  [7] Peter_Hindwood ->Kathleen_Wynne  Peter_Hindwood ->ec_minister    
##  [9] Peter_Hindwood ->Glen4ONT        Peter_Hindwood ->EcoSenseNow    
## [11] calstark87     ->saul42          paulramsaybamff->Alex_Verbeek   
## [13] ZEROCO2_       ->ZEROCO2_        kbelesova      ->LancetCountdown
## [15] NRGrenaissance ->Agropark        NRGrenaissance ->politicianslie 
## [17] NRGrenaissance ->thegreenpagesBC NRGrenaissance ->ThatArcher     
## [19] NRGrenaissance ->UNSDI_NCO       NRGrenaissance ->BrianDColwell  
## + ... omitted several edges
## + 31/31 vertices, named:
##  [1] Agropark        Alex_Verbeek    AlexEpstein     Blurred_Trees  
##  [5] boyz2menentgh   BrianDColwell   calstark87      ClimateTreaty  
##  [9] clsgeography    ec_minister     EcoSenseNow     ElongWilliam   
## [13] Glen4ONT        GrantLeonagrant guardianeco     Kathleen_Wynne 
## [17] kbelesova       LancetCountdown m_kaczerowski   NRGrenaissance 
## [21] paulramsaybamff Peter_Hindwood  politicianslie  saul42         
## [25] SDGscameroon    ThatArcher      thegreenpagesBC UNDESA         
## [29] UNSDI_NCO       WinnieKamau254  ZEROCO2_

It is also easy to extract an edge list or matrix back from the igraph network:

##       [,1]              [,2]             
##  [1,] "SDGscameroon"    "WinnieKamau254" 
##  [2,] "SDGscameroon"    "ElongWilliam"   
##  [3,] "SDGscameroon"    "GrantLeonagrant"
##  [4,] "clsgeography"    "UNDESA"         
##  [5,] "Peter_Hindwood"  "Blurred_Trees"  
##  [6,] "Peter_Hindwood"  "AlexEpstein"    
##  [7,] "Peter_Hindwood"  "Kathleen_Wynne" 
##  [8,] "Peter_Hindwood"  "ec_minister"    
##  [9,] "Peter_Hindwood"  "Glen4ONT"       
## [10,] "Peter_Hindwood"  "EcoSenseNow"    
## [11,] "calstark87"      "saul42"         
## [12,] "paulramsaybamff" "Alex_Verbeek"   
## [13,] "ZEROCO2_"        "ZEROCO2_"       
## [14,] "kbelesova"       "LancetCountdown"
## [15,] "NRGrenaissance"  "Agropark"       
## [16,] "NRGrenaissance"  "politicianslie" 
## [17,] "NRGrenaissance"  "thegreenpagesBC"
## [18,] "NRGrenaissance"  "ThatArcher"     
## [19,] "NRGrenaissance"  "UNSDI_NCO"      
## [20,] "NRGrenaissance"  "BrianDColwell"  
## [21,] "NRGrenaissance"  "m_kaczerowski"  
## [22,] "NRGrenaissance"  "guardianeco"    
## [23,] "NRGrenaissance"  "ClimateTreaty"  
## [24,] "boyz2menentgh"   "Alex_Verbeek"
## 31 x 31 sparse Matrix of class "dgCMatrix"
##                                                                          
## Agropark        . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Alex_Verbeek    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AlexEpstein     . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Blurred_Trees   . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## boyz2menentgh   . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## BrianDColwell   . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## calstark87      . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . .
## ClimateTreaty   . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## clsgeography    . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 .
## ec_minister     . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## EcoSenseNow     . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## ElongWilliam    . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Glen4ONT        . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## GrantLeonagrant . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## guardianeco     . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Kathleen_Wynne  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## kbelesova       . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . .
## LancetCountdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## m_kaczerowski   . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## NRGrenaissance  1 . . . . 1 . 1 . . . . . . 1 . . . 1 . . . 1 . . 1 1 . 1
## paulramsaybamff . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## Peter_Hindwood  . . 1 1 . . . . . 1 1 . 1 . . 1 . . . . . . . . . . . . .
## politicianslie  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## saul42          . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## SDGscameroon    . . . . . . . . . . . 1 . 1 . . . . . . . . . . . . . . .
## ThatArcher      . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## thegreenpagesBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## UNDESA          . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## UNSDI_NCO       . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## WinnieKamau254  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## ZEROCO2_        . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##                    
## Agropark        . .
## Alex_Verbeek    . .
## AlexEpstein     . .
## Blurred_Trees   . .
## boyz2menentgh   . .
## BrianDColwell   . .
## calstark87      . .
## ClimateTreaty   . .
## clsgeography    . .
## ec_minister     . .
## EcoSenseNow     . .
## ElongWilliam    . .
## Glen4ONT        . .
## GrantLeonagrant . .
## guardianeco     . .
## Kathleen_Wynne  . .
## kbelesova       . .
## LancetCountdown . .
## m_kaczerowski   . .
## NRGrenaissance  . .
## paulramsaybamff . .
## Peter_Hindwood  . .
## politicianslie  . .
## saul42          . .
## SDGscameroon    1 .
## ThatArcher      . .
## thegreenpagesBC . .
## UNDESA          . .
## UNSDI_NCO       . .
## WinnieKamau254  . .
## ZEROCO2_        . 1

Now that we have our igraph network object, let’s make a first attempt to plot it.

# removing the loops in the graph.
g <- simplify(g, remove.multiple = F, remove.loops = T) 


# reduce the arrow size 
plot(g, edge.arrow.size=.1, edge.color="blue",
     vertex.color="orange", vertex.frame.color="#ffffff",
     vertex.label=V(g)$name, vertex.label.color="black") 


# simpleNetwork from networkD3 package creates simple D3 JavaScript force directed network graphs.

networkD3::simpleNetwork(edges, Source = "source", Target = "target")

# scale vertices color to # of favorites

pal <- RColorBrewer:::brewer.pal(5, "Dark2")
V(g)$color <- colorRampPalette(pal)(length(V(g)$favoriteCount))

# plot vertex color ~ degree

plot(g, vertex.color = V(g)$color, edge.label = degree(g), edge.arrow.mode = 1, edge.arrow.size=.2)

Detect communities in the graph

Communities are groups of nodes in a graph that are highly connected, indicated that they ‘belong’ together for some reason.

However, in graphs, communities can overlap, where nodes can belong to more than one group. In this case, Elon Musk is followed by a lot of people in the DS community, even though one wouldn’t consider him to be a member.

# get communities
cm <- walktrap.community(g)

plot(g, vertex.color = membership(cm), edge.arrow.size=.1, edge.arrow.mode = 3, vertex.size = log1p(degree(g)) * 8)

Summary/Conclusion