Assignment 1

Author

Deborah E. Lucas

Published

February 8, 2025

Basic Network Measures and Visualization in R

Dataset - The dataset for this assignment is from Palazzolo, E. T. (2005) Organizing for information retrieval in transactive memory systems. Communication Research, 32(6), 726-761. Two files are explored for this assignment:

edgelist_retrieve.csv

att_expertise.csv

Preparing the Environment

Load required packages

TASK 1: Preparing the environment

Load data set

edge_data <- read.csv("edgelist_retrieve.csv")
att_expertise <- read.csv("att_expertise.csv")

Load library for igraph

library(igraph)

Attaching package: 'igraph'
The following objects are masked from 'package:stats':

    decompose, spectrum
The following object is masked from 'package:base':

    union
graph_from_data_frame(edge_data, directed = TRUE)
IGRAPH 3a1034a DN-- 15 41 -- 
+ attr: name (v/c)
+ edges from 3a1034a (vertex names):
 [1] 1 ->6  1 ->9  2 ->6  2 ->9  3 ->5  3 ->6  3 ->9  3 ->17 4 ->5  4 ->6 
[11] 4 ->9  4 ->16 4 ->17 5 ->6  5 ->9  5 ->16 5 ->17 6 ->5  6 ->9  6 ->16
[21] 7 ->16 12->8  12->16 13->6  13->9  13->17 14->4  14->6  14->9  14->16
[31] 15->3  15->6  15->9  15->16 15->17 16->5  16->6  16->9  17->5  17->9 
[41] 17->16

For the edgelist_retrieve, there are 15 nodes and 41 edges. The number of nodes shows the number of entities within the network and the number of edges shoes the relationships between them

graph_from_data_frame(att_expertise, directed = TRUE)
IGRAPH d11ae4a DN-- 25 17 -- 
+ attr: name (v/c)
+ edges from d11ae4a (vertex names):
 [1] 1 ->0.176470588 2 ->0.176470588 3 ->0.176470588 4 ->0.529411765
 [5] 5 ->0.529411765 6 ->0.588235294 7 ->0.058823529 8 ->0.294117647
 [9] 9 ->0.588235294 10->0.058823529 11->0.176470588 12->0.058823529
[13] 13->0.058823529 14->0.352941176 15->0.235294118 16->0.588235294
[17] 17->0.411764706

In the att_expertise, there are 25 nodes and 17 edges representing the connections or relationships between the nodes and edges. The edges have a weight that represents something about the relationship between nodes as indicated by the value - for example, 1-> 0. 176470588 has less connection than 15 -> 0.235294118

TASK 2: Network Descriptive Statistics with igraph

A. Calculate the density

graph <- graph_from_data_frame(edge_data, directed = TRUE)
density_value <- edge_density(graph)
cat("density of the graph:", density_value, "\n")
density of the graph: 0.1952381 

The density value of .3904762 indicates that roughly 40% of the possible connections between nodes exist. This means that while there is a moderate level of connection, there is still the possibility for more connections.

B. Calculate the degree (in and out)

 in_degree <- degree(graph, mode = "in")

out_degree <- degree(graph, mode = "out")

cat("In-degree of each node:\n")
In-degree of each node:
print(in_degree)
 1  2  3  4  5  6  7 12 13 14 15 16 17  9  8 
 0  0  1  1  5  9  0  0  0  0  0  8  5 11  1 
cat("Out-degree of each node:\n")
Out-degree of each node:
print(out_degree)
 1  2  3  4  5  6  7 12 13 14 15 16 17  9  8 
 2  2  4  5  4  3  1  2  3  4  5  3  3  0  0 

In-Degree

Node 6 has the highest in degree which means that it is the “most popular” or most connected node within the network.

Nodes 5, 9, 16, and 17 also have relatively high in-degrees which suggests that these nodes are highly connected.

Nodes 1, 7, and 8 have the lowest in-degree which indicates that they are the least connective or least active of the nodes.

Out-Degree

Node 6 also has the highest out-degree which means that it participates in or sends connections to other nodes. Node 6 has the highest in and out degree indicating that it could be the center of the network.

Nodes 5, 9, and 16 also have comparatively high out-degree indicating that they are very active.

Nodes 1, 7, and 8 have the lowest out-degree which correlates to their low in-degree indicating that they are not very active in the network.

C. Calculate additional network measure

Betweeness centrality

betweenness_centrality <- betweenness(graph)
cat("betweenness centrality of each node:\n")
betweenness centrality of each node:
max_betweenness_node <- which.max(betweenness_centrality)
cat("Node with the highest betweenness centrality:", max_betweenness_node, "\n")
Node with the highest betweenness centrality: 12 

In this calculation, Node 12 has the highest betweenness centrality which means that it is central to connecting the other parts of the network and is positioned in such a way that it provides the shortest path between nodes. Node 12 is a central node in terms of information flow and connectiveness.

Clustering Coeefficient

local_clustering <- transitivity(graph, type = "local")
cat("Local clustering coefficients for each node:\n")
Local clustering coefficients for each node:
print(local_clustering)
        1         2         3         4         5         6         7        12 
1.0000000 1.0000000 0.8000000 0.8000000 0.8000000 0.3777778       NaN 0.0000000 
       13        14        15        16        17         9         8 
0.6666667 1.0000000 0.8000000 0.4166667 0.5714286 0.4181818       NaN 
summary(local_clustering)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 0.0000  0.4182  0.8000  0.6654  0.8000  1.0000       2 
any(!is.finite(local_clustering))
[1] TRUE
local_clustering[!is.finite(local_clustering)] <- 0
plot(graph, vertex.size = local_clustering * 50, main = "Local Clustering Coefficients")

vertex_size <- local_clustering * 12
vertex_size[!is.finite(vertex_size)] <- 3  
vertex_size[vertex_size <= 0] <- 2

For the values between 0 and 1, the node’s are all interconnected (Nodes 1 and 2). For other values such as 8 and 12, there is partial connectedness. The closer the value is to 1, the more connected the nodes are as evidenced by the visualization above.

TASK 3: Network Visualization

indegree_centrality <- degree(graph, mode = "in")  
expertise <- V(graph)$expertise  
node_size <- indegree_centrality * 5  
node_color <- ifelse(expertise > 0.3, "skyblue", "salmon")
layout_kk <- layout_with_kk(graph)

edge_color <- ifelse(E(graph)$weight > 5, "darkred", "gray")
arrow_size <- 10  
plot(graph,
     layout = layout_kk,
     vertex.size = node_size,
     vertex.color = node_color,
     edge.arrow.size = arrow_size,
     edge.color = edge_color,
     main = "Network Visualization: In-degree Centrality & Expertise", cex.main = .2
)