Lab 1 - SNA in Education

Author

Daria Smyslova

Task 1: Preparing the environment

install.packages("igraph")

Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
(as 'lib' is unspecified)

#Load packages
library("igraph")


Attaching package: 'igraph'

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

#Load network data
network_data = read.csv("/cloud/project/data/edgelist_retrieve.csv")
network_data

#Load network attributes
network_attributes = read.csv("/cloud/project/data/att_expertise.csv")
network_attributes

   id  expertise
1   1 0.17647059
2   2 0.17647059
3   3 0.17647059
4   4 0.52941176
5   5 0.52941176
6   6 0.58823529
7   7 0.05882353
8   8 0.29411765
9   9 0.58823529
10 10 0.05882353
11 11 0.17647059
12 12 0.05882353
13 13 0.05882353
14 14 0.35294118
15 15 0.23529412
16 16 0.58823529
17 17 0.41176471

#Create igraph
work_teams <- graph_from_data_frame(d = network_data, directed = T,vertices = network_attributes)
work_teams

IGRAPH a37d870 DN-- 17 41 -- 
+ attr: name (v/c), expertise (v/n)
+ edges from a37d870 (vertex names):
 [1] 1 ->6  1 ->9  2 ->6  2 ->9  3 ->5  3 ->6  3 ->9  3 ->17 4 ->5  4 ->6 
[11] 4 ->9  4 ->16 4 ->17 5 ->6  5 ->9  5 ->16 5 ->17 6 ->5  6 ->9  6 ->16
[21] 7 ->16 12->8  12->16 13->6  13->9  13->17 14->4  14->6  14->9  14->16
[31] 15->3  15->6  15->9  15->16 15->17 16->5  16->6  16->9  17->5  17->9 
[41] 17->16

There are 17 nodes and 41 edges in the network graph.

Using the data from Palazzolo, E. T. (2005). Organizing forinformation retrieval in transactive memory systems. Communication Research, 32(6), 726-761 research, we conclude that:

Nodes represent individual members of the organizational work teams. Each node corresponds to a team member, and their interactions and connections within the network reflect communication ties for information retrieval.

The edges in the network represent communication channels or ties between the team members. Specifically, an edge between two nodes indicates that there is a relationship or interaction between those two members regarding the retrieval of topic-specific information. These edges capture the flow of information within the work teams and help analyze the emergent communication patterns discussed in the research.

Task 2: Network Descriptive Statistics with igraph

Density

#Calculate the density
density = edge_density(work_teams)
density

[1] 0.1507353

The density calculation for the network of work teams is approximately 0.1507. In the context of social network analysis, density measures the proportion of existing connections (edges) out of all possible connections among the nodes (members of work teams).

A density of 0.1507 indicates that about 15.07% of all possible communication ties between team members are actually present in the network. This suggests that the network is moderately sparse, meaning there are significant opportunities for additional communication ties to form among team members.

In the context of the research exploring information retrieval within work teams, this moderate density implies that while there are existing communication channels for sharing information, there is still room for improvement in terms of fostering more connections among team members. Increasing the density could potentially enhance the efficiency and effectiveness of information retrieval processes within the teams.

In-Degree and Out-Degree

#Calculate in-degree
in_degree <- degree(work_teams, mode = "in")
in_degree

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
 0  0  1  1  5  9  0  1 11  0  0  0  0  0  0  8  5

#Calculate out-degree
out_degree <- degree(work_teams, mode = "out")
out_degree

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
 2  2  4  5  4  3  1  0  0  0  0  2  3  4  5  3  3

In the context of the data, the in-degree and out-degree provide insights into the information retrieval dynamics within the work teams. Higher in-degree for a node indicates that more team members are seeking information from that particular member. Higher out-degree for a node suggests that the node is actively seeking information from other team members. These metrics help identify central nodes within the network, which are crucial for information dissemination and collaboration.

The results indicate the following:

Node 9 has the highest in-degree, meaning it is sought after by 11 other nodes for information retrieval. The second highest in-degree node is Node 6 with 9 nodes for information retrieval
Nodes 4, 14 have the highest out-degree, indicating that they actively seek information from 5 other nodes.
Nodes 3, 4, 5, and 16 have relatively high in-degree and out-degree, suggesting they play central roles in the information retrieval process within the work teams.
Nodes 1, 2, 7, 8, 10, 11, 12, 13, 14, 15, and 17 have lower degrees, indicating they are less involved in information retrieval activities compared to other nodes.

Betweenness Centrality

One relevant network measure that could provide insights into the structure of the network is betweenness centrality. Betweenness centrality quantifies the extent to which a node lies on the shortest paths between other nodes in the network. It indicates the importance of a node in facilitating communication and information flow between other nodes.

I chose betweenness centrality because it helps identify nodes that act as bridges or mediators in the network, facilitating the flow of information between different parts of the network. This measure is particularly relevant in the context of information retrieval within work teams because it can highlight nodes that play crucial roles in connecting team members and facilitating collaboration.

#Calculate betweenness centrality
betweenness <- betweenness(work_teams)
betweenness

       1        2        3        4        5        6        7        8 
0.000000 0.000000 0.250000 1.333333 6.833333 7.916667 0.000000 0.000000 
       9       10       11       12       13       14       15       16 
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 9.083333 
      17 
1.583333

The betweenness centrality calculations provide the following results:

Node 16 has the highest betweenness centrality score of approximately 9.08, indicating that it serves as a crucial bridge or mediator connecting other nodes in the network.
Nodes 5 and 6 also have relatively high betweenness centrality scores of approximately 6.83 and 7.92, respectively, suggesting that they play significant roles in facilitating communication and information flow between other nodes.
Node 4 has a betweenness centrality score of approximately 1.33, indicating moderate importance in connecting different parts of the network.
Node 3 also has non-zero betweenness centrality scores, suggesting that it contributes to the flow of information within the network, albeit to a lesser extent compared to nodes 4 and 17.

Overall, betweenness centrality highlights the importance of certain nodes in facilitating communication and information exchange within the work teams. These nodes can be key targets for fostering collaboration and enhancing the efficiency of information retrieval processes.

3. Network Visualization

#Define the colors based on expertise
expertise_color <- ifelse(network_attributes$expertise > 0.3, "#ADD8E6", "#FF7F7F")

plot(work_teams, 
     layout = layout_with_kk(work_teams), #Kamada-Kawai layout
     vertex.size = 5 * sqrt(in_degree), #node size based on indegree centrality
     vertex.color = expertise_color, #node color based on expertise
     edge.arrow.size = 0.05,
     edge.color = "gray")

Enlarged version of the graph:

The graph provides insights into the structure of the network, highlighting key nodes (members with high indegree centrality) and their expertise levels. It helps visualize the connections and communication patterns within the team, allowing for a better understanding of the relationships between team members and their respective expertise.