Project Description

In this project, you will analyze a directed graph, which represents the communication among a large group of individuals. The graph will be provided by the instructor. First, you will create an adjacency matrix representation of the graph. Then you will perform your analysis, which will include (but not limited to) the following:


Data Explanation

The data comes from a directed graph image that was given by the instructor:


I used the graph to make a directed adjacency matrix:

library(igraph)
library(knitr)

matr1 = as.matrix(read.csv("project3Data.csv", row.names = 1))

kable(matr1)
A B C D E F G H
A 0 0 0 0 0 1 1 0
B 1 0 0 1 0 0 1 0
C 0 0 0 0 0 0 0 0
D 0 1 1 0 0 0 0 0
E 0 0 0 0 0 0 0 1
F 1 0 0 0 0 0 0 0
G 0 1 0 0 0 0 0 0
H 1 0 0 0 1 0 0 0

This is a plot of the network formed from the adjacency matrix:

g = graph_from_adjacency_matrix(matr1)

plot(g, edge.arrow.size = 0.5)


Computations

Density

“Density reflects the extent to which the nodes in a network are connected with each other” (Social Network Analysis: Methods and Examples, 58).

This is the formula for density:

\[ \begin{eqnarray} D & = & \frac{\sum{\sum{X_i,_j}}}{N(N-1)}\\\\ D & = & density\\ X & = & network\\ i & = & row\\ j & = & column\\ N & = & nodes \end{eqnarray} \]

Below is the density of the network:

edge_density(g)
## [1] 0.2142857

This means that about 21% of every possible directed tie in the network actually exists.

Centrality

“Degree centrality simply counts the number of connections an actor or node has” (Social Network Analysis: Methods and Examples, 61).

This is the formula for centrality:

\[ \begin{eqnarray} C_D(N_i) & = & \sum_{j=1}^{g}{X_i,_j}\\\\ C_D & = & degree\ centrality\\ N & = & nodes\\ g & = & actors\\ x & = & network\\ i & = & row\\ j & = & column \end{eqnarray} \]

Here is the “in” centrality for all the nodes in the network:

degree(g, mode = "in")
## A B C D E F G H 
## 3 2 1 1 1 1 2 1

This represents the amount of incoming edges attached to each node in the network. For example, there are 3 nodes that have directed edges pointing to node A.

Here is the “out” centrality for all the nodes in the network:

degree(g, mode = "out")
## A B C D E F G H 
## 2 3 0 2 1 1 1 2

This represents the amount of outgoing edges sprouting from each node in the network. For example, node A has 2 directed edges that are pointing to 2 other nodes.

Centralization

“The distribution of degree centrality among the nodes of a network often helps us to understand how equal network actors are. One useful way to summarize this (in) equality is degree centralization” (Social Network Analysis: Methods and Examples, 62).

This is the formula for centralization:

\[ \begin{eqnarray} C_D & = & \frac{\sum_{i=1}^{N}{(C_D(N^*)-C_D(N_i))}}{(N-1)(N-2)}\\\\ C_D & = & degree\ centralization\\ N & = & nodes\\ i & = & row\\ \end{eqnarray} \]

Here is the centralization of the network:

centr_degree(g)$centralization
## [1] 0.1632653

In degree centralization, 0 suggests complete equality in degree centrality among the nodes, and 1 indicates complete inequality in degree centrality. A score of about 0.163 indicates that most of the nodes in the network have about the same degree centrality.

Cliques

“The most rigid definition of a clique is a maximal complete sub graph of three or more nodes, all of which are directly connected to one another, with no other node in the network having direct ties to every member of the clique” (Social Network Analysis: Methods and Examples, 71).

These are the 1-cliques in the network with at least 3 members:

cliques(g, min=3)
## [[1]]
## + 3/8 vertices, named, from 0bb77c8:
## [1] A B G

This means that there is one 1-clique made up of the set of nodes {A, B, G}. This can be clearly seen in the plotted network:

vColors = c("#E74C3C", "#F1C40F")[c(1,1,2,2,2,2,1,2)]

plot(g, edge.arrow.size = 0.5, vertex.color = vColors)

2-clique networks are different from 1-clique networks in that they measure the cliques of 3 or more nodes in which all nodes are connected to each other by at most 2 connections.

These are the 2-cliques in the network:

library(RBGL)

cl = kCliques(igraph.to.graphNEL(as.undirected(g)))
cl$`2-cliques`
## [[1]]
## [1] "A" "B" "D" "G"
## 
## [[2]]
## [1] "A" "B" "F" "G" "H"
## 
## [[3]]
## [1] "B" "C" "D"
## 
## [[4]]
## [1] "A" "E" "H"

Multidimensional Scaling

“Multidimensional scaling (MDS) is a visualization tool to map the distance between nodes in a network” (Knoke & Yang, 2008; Kruskal & Wish, 1978; Wasserman & Stanley, 1994).

For a given pair of nodes between i and j, their Euclidean distance on a 2D plane is:

\[d_i,_j=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2}\]

This is the matrix that represents the distances between nodes in the network:

kable(distances(g), directed = TRUE)
A B C D E F G H
A 0 1 3 2 2 1 1 1
B 1 0 2 1 3 2 1 2
C 3 2 0 1 5 4 3 4
D 2 1 1 0 4 3 2 3
E 2 3 5 4 0 3 3 1
F 1 2 4 3 3 0 2 2
G 1 1 3 2 3 2 0 2
H 1 2 4 3 1 2 2 0

For example, the distance between node A and node C is 3 because there are 3 nodes in between node A and C. This type of measurement has various utilities for real networks. For a social network, it could be used to measure the distance between people based on their relationships with each other. For a business network, this matrix could represent the distance of a potential networking connection between professionals based on the connections they have already established.

Structural Equivalence

“Structural equivalence is a social network method to analyze competitive relations between dyads, or pairs of nodes, in a given network” (Social Network Analysis: Methods and Examples, 75).

Two features are unique for structural equivalence analysis:

  1. It is a method to measure and reflect competitive rather than cohesive relations between social actors.
  2. It operates at the level of pairs or dyads, rather than with individual actors, groups, or entire networks.

For a given pair of actors, examining the presence or absence of ties between each actor of the pair and the rest of the actors in a network would suffice for computing the structural equivalence between the pair:

\[ \begin{eqnarray} D_{ij} & = & \sqrt{\sum_{k=1}^{g}{[(X_{ik}-X_{jk})^2]}}\\\\ D_{ij} & = & structural\ equivalence\\ X & = & network\\ i & = & row\\ j & = & column\\ g & = & actors \end{eqnarray} \]

This is a matrix representing the structural equivalence between each pair of nodes in the network:

library(sna)

strEq = as.matrix(sedist(matr1))
rownames(strEq) = colnames(strEq) = colnames(matr1)
kable(strEq)
A B C D E F G H
A 0 6 6 6 5 3 4 6
B 6 0 4 4 7 5 4 6
C 6 4 0 2 3 3 4 4
D 6 4 2 0 5 5 2 6
E 5 7 3 5 0 4 5 1
F 3 5 3 5 4 0 3 3
G 4 4 4 2 5 3 0 6
H 6 6 4 6 1 3 6 0

The higher the structural similarity score, the lower the level of competition between the nodes. The lower the similarity score, the higher the level of competition. The level of competition means that the nodes are similar, and fulfill the same role in the network. For example, in a network of food providers to Walmart, two nodes that represent two different companies that both provide bread to Walmart would have very low similarity scores, which means that they have very high levels of competition. If one bread maker were to go under, the other would be able to fulfill the bread quota to Walmart. In this network, the highest structural similarity score is 7, and it is shared between E and B. This means that the level of competition between these two nodes is low. The lowest similarity score is 0, and it is shared between nodes A and B. The level of competition between these nodes is high.


References

Yang, S., Zhang, L., & Keller, F. B. (2017). Social network analysis: methods and examples. Los Angeles: Sage.