Assignment 1 ECI589

Author

Delaney Burns

Task 1: Preparing the Environment

Loading packages

install.packages("pacman") #Installing the package pacman
Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
(as 'lib' is unspecified)
library(pacman) #loading the package
pacman::p_load(magrittr,
dplyr,
psych, 
igraph,
reshape,
intergraph)

Load Data

attributes <- read.csv("att_expertise.csv", stringsAsFactors = T)
edgelist <- read.csv("edgelist_retrieve.csv")

Manipulate Data

rownames(edgelist) <- 1:nrow(edgelist)
colnames(edgelist) <- 1:ncol(edgelist)
edgelist
    1  2
1   1  6
2   1  9
3   2  6
4   2  9
5   3  5
6   3  6
7   3  9
8   3 17
9   4  5
10  4  6
11  4  9
12  4 16
13  4 17
14  5  6
15  5  9
16  5 16
17  5 17
18  6  5
19  6  9
20  6 16
21  7 16
22 12  8
23 12 16
24 13  6
25 13  9
26 13 17
27 14  4
28 14  6
29 14  9
30 14 16
31 15  3
32 15  6
33 15  9
34 15 16
35 15 17
36 16  5
37 16  6
38 16  9
39 17  5
40 17  9
41 17 16

Create Graph and Plot the Data

graph <- graph_from_data_frame(d = edgelist, directed = T, vertices = attributes) 
graph
IGRAPH 61924e3 DN-- 17 41 -- 
+ attr: name (v/c), expertise (v/n)
+ edges from 61924e3 (vertex names):
 [1] 1 ->6  1 ->9  2 ->6  2 ->9  3 ->5  3 ->6  3 ->9  3 ->17 4 ->5  4 ->6 
[11] 4 ->9  4 ->16 4 ->17 5 ->6  5 ->9  5 ->16 5 ->17 6 ->5  6 ->9  6 ->16
[21] 7 ->16 12->8  12->16 13->6  13->9  13->17 14->4  14->6  14->9  14->16
[31] 15->3  15->6  15->9  15->16 15->17 16->5  16->6  16->9  17->5  17->9 
[41] 17->16
V(graph)$id
NULL
plot(graph)

Finding Isolates

isolates <- V(graph)[degree(graph, mode = "all") == 0]
print(isolates)
+ 2/17 vertices, named, from 61924e3:
[1] 10 11

There are 17 Nodes and 41 Edges.

The nodes indicate the number of people included in the data. The Edges are the interactions between the people. As you can see above, there are 17 participants which is also reflected in the attribute data because there are 17 individual IDs. Additionally, there are 41 Edges. This data comes from an edgelist so it will NOT include isolates (or individuals who did not interact with others). However, when you include attributes as the vertices you can see the isolates. 10 and 11 are isolates.

Task 2: Network Descriptive Statistics

Density

edge_density(graph)
[1] 0.1507353

The network density is 0.1507353, which is a low density because it is closer to 0 than 1. A low density suggests that most people are not retrieving information from many others or making connections.

Degree

outdeg_igraph <- degree(graph = graph, mode = "out")
indeg_igraph <- degree(graph = graph, mode = "in")

outdeg_igraph
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
 2  2  4  5  4  3  1  0  0  0  0  2  3  4  5  3  3 
indeg_igraph
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
 0  0  1  1  5  9  0  1 11  0  0  0  0  0  0  8  5 

Degree shows how many connections a node has (in or out). Nodes with high in-degree are frequently sought for information. Nodes with high out-degree frequently seek information from others. So as you can see above, Person 2 has 2 connections out and 0 connections in. Meaning, they sought information twice but no one sought them for information.

Closeness

closeness_in <- closeness(graph, mode = "in")
closeness_out <- closeness(graph, mode = "out")

closeness_df <- data.frame(
  Node = V(graph)$name,
  In_Closeness = closeness_in,
  Out_Closeness = closeness_out)

print(closeness_df)
   Node In_Closeness Out_Closeness
1     1          NaN    0.11111111
2     2          NaN    0.11111111
3     3   1.00000000    0.16666667
4     4   1.00000000    0.20000000
5     5   0.05263158    0.25000000
6     6   0.06666667    0.20000000
7     7          NaN    0.10000000
8     8   1.00000000           NaN
9     9   0.06666667           NaN
10   10          NaN           NaN
11   11          NaN           NaN
12   12          NaN    0.09090909
13   13          NaN    0.14285714
14   14          NaN    0.12500000
15   15          NaN    0.14285714
16   16   0.06250000    0.20000000
17   17   0.04347826    0.20000000

I included closeness in my network analysis to identify highly influential nodes. Closeness centrality helps determine which nodes can efficiently reach others. Node 5 has the highest out-closeness, indicating it can access all other nodes with the shortest average path length. Meanwhile, Nodes 3, 4, and 8 have an in-closeness value of 1, suggesting they are directly connected to certain nodes.

Task 3: Network Visualization

V(graph)$size <- degree(graph, mode = "in") * 3  
V(graph)$color <- ifelse(V(graph)$expertise > 0.3, "pink", "red")  
plot(graph, 
     layout = layout_with_kk(graph),
     vertex.label = V(graph)$name,
     vertex.frame.color = "black",
     edge.arrow.size = 0.2,
     main = "Network Visualization: Information Retrieval")

  • Nodes with more incoming connections (higher in-degree) appear larger.

  • Nodes with higher expertise appear in pink, while others appear in red.