Data Cleanse…

As in any data problem, the most important aspect is making sure our data is completely clean of noise.

We will:

  1. plot the entirety of df, based on the columns ‘source’ and ‘target’ (df[1:2])
  2. given that the the ‘id’ column in attrs is cross sectional with ‘source’ and ‘target’ in df, we can use V() to add attributes to the network and reference attrs in the as.character() method. V() works exactly like axis() for matplotlib in python.
# 1. load df into a graph and plot it
mynetwork = graph.data.frame(df[1:2], directed=FALSE)
plot(mynetwork, layout=layout.fruchterman.reingold, vertex.label="", vertex.size=1)

# add node attributes
V(mynetwork)$Group = as.character(attrs$Group) # we can categorize the nodes by group
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
V(mynetwork)$ACTIVITY = as.character(attrs$ACTIVITY) # and we can also categorize them by activity
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
V(mynetwork)$color = ifelse(V(mynetwork)$Group == "study", "lightblue", ifelse(V(mynetwork)$Group == "comp", "tomato", "white"))
plot(mynetwork, layout=layout_nicely, vertex.label="", vertex.size=5)