As in any data problem, the most important aspect is making sure our data is completely clean of noise.
We will:
# 1. load df into a graph and plot it
mynetwork = graph.data.frame(df[1:2], directed=FALSE)
plot(mynetwork, layout=layout.fruchterman.reingold, vertex.label="", vertex.size=1)
# add node attributes: Group || Activity || Weight
V(mynetwork)$Group = as.character(attrs1$Group) # we can categorize the nodes by group
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
V(mynetwork)$ACTIVITY = as.character(attrs1$ACTIVITY) # and we can also categorize them by activity
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
V(mynetwork)$Weight = as.character(attrs1$weighted.herfindahl)
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
V(mynetwork)$Subject = as.character(attrs1$primary.subject)
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
# randomize 5 colors for the different groups
coul = brewer.pal(5, 'Set1')
my_color <- coul[as.numeric(as.factor(V(mynetwork)$Subject))]
# plot based on Subject
plot(mynetwork, vertex.color=my_color, vertex.label='', vertex.size=5, main='Colors Based on Subject')
The Herfindahl Index: “is a commonly accepted measure of market concentration.” (Investopedia). In this case, our ‘market’ is the health sector, and a weighted Herfindahl index has been calculated for every node. These two can be crossvalidated with the ‘pi_key’ column in attrs & herfindahl. We can later add these weights to the nodes via the ‘weighted.herfindahl’ column in herfindahl. Finally, a weight of 0.18 or more is considered substantial market share, so we will highlight nodes with 0.18 or more in magenta, and nodes with a value less than 0.18 in green. For reference, 0.18 (or 18,000 points) is a ‘benchmark’ value in the US, which establishes the point at which merger laws can be considered.
In the case of the NIGMS data used for these networks, we can observe that most of the nodes are above the 0.18 (18,000 point) benchmark in magenta. This indicates that there is a very high degree of concentration in the publications that get published and co-authored.
#load df into a graph object
mynetwork1 = graph.data.frame(df[1:2], directed=FALSE)
# add node attributes: Group || Activity || Weight
V(mynetwork1)$Group = as.character(attrs1$Group) # we can categorize the nodes by group
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
V(mynetwork1)$ACTIVITY = as.character(attrs1$ACTIVITY) # and we can also categorize them by activity
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
V(mynetwork1)$Weight = as.character(attrs1$weighted.herfindahl)
## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length
# color by Weight: >= 0.3 is magenta3, while < 0.3 is green3
V(mynetwork1)$color = ifelse(V(mynetwork1)$Weight >= 0.18, "magenta3", ifelse(V(mynetwork1)$Weight < 0.18, "green3", "white"))
plot(mynetwork1, layout=layout_nicely, vertex.label="", vertex.size=5, main='Colors Based on Herfindahl Weights')