Assignment 1

Author

Nancy

pacman::p_load(tidyverse,
              magrittr,
              dplyr,
              psych,
              igraph,
              reshape, intergraph )

After preparing the environment, we will load network data

edgelist <- read_csv("edgelist_retrieve.csv")
Rows: 41 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): from, to

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
attributes <- read_csv("att_expertise.csv")
Rows: 17 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): id, expertise

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now, we will create igraph from edgelist

edgelist_net <- graph_from_data_frame(d=edgelist,directed=T, vertices= attributes)

 plot (edgelist_net)

The nodes are all the id names, with each carring attributes from the attribute document. The edges indicate directed relationship from one id to another, as shown in the edgelist cvs document. For example, there is an edge from node 1 to node 6, and another from node 1 to node 9.

Task 2: Network descriptive Statistics

density

num_edges <- gsize(edgelist_net)
num_edges
[1] 41
num_nodes <-gorder (edgelist_net)
num_nodes
[1] 17
num_dyads <- ( num_nodes *(num_nodes-1))

den<-num_edges/num_dyads

den
[1] 0.1507353

The density of this network is around 0.151, indicating a sparse strucutre. The visualization also reveals several outlines.

degree

outdeg <- degree (graph=edgelist_net, mode= "out")
indeg <- degree (graph=edgelist_net,mode="in")

outdeg
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
 2  2  4  5  4  3  1  0  0  0  0  2  3  4  5  3  3 
indeg
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
 0  0  1  1  5  9  0  1 11  0  0  0  0  0  0  8  5 

Overall, the outdegree distribution is relatively consistent across nodes, except for two notably high values at nodes 4 and 15, indicating they serve as major initiators of connections. In contrast, indegree varies significantly, with nodes 6, 9, and 16 receiving the highest values. This suggests that these nodes are being frequently refered within the network.

Task 3: Network visualization

set up node color based on expertise

col <-ifelse (attributes $ expertise <= 0.3, "blue","red")

col
 [1] "blue" "blue" "blue" "red"  "red"  "red"  "blue" "blue" "red"  "blue"
[11] "blue" "blue" "blue" "red"  "blue" "red"  "red" 
V(edgelist_net) $ color <- col

plot (edgelist_net)

adjust node size

indeg <- degree(edgelist_net,mode ="in")

plot(edgelist_net,vertex.size= indeg +3, margin=-0.3
     )

reduce arrow size to make it clear

indeg <- degree(edgelist_net,mode ="in")

plot(edgelist_net,vertex.size= indeg +3,edge.arrow.size=0.3, margin=-0.3
     )

set up the layout

indeg <- degree(edgelist_net,mode ="in")

plot(edgelist_net,vertex.size= indeg +3,edge.arrow.size=0.3,layout=layout_with_kk, margin=-0.3
     )

indeg <- degree(edgelist_net,mode ="in")

plot(edgelist_net,vertex.size= indeg +3,edge.arrow.size=0.3,layout=layout_with_fr, margin=-0.3
     )

Compared to the KK layout, the Fruchterman-Reingold layout appears to be a better fit. Larger nodes, such as 6, 9, 16, and 17, have higher indegree, indicating their high centrality and significant role in the network. In contrast, smaller nodes, such as 10 and 12, have fewer or no connections, highlighting their peripheral position