week 1: familiarity with igraph

Data Cleanse…

As in any data problem, the most important aspect is making sure our data is completely clean of noise.

We will:

plot the entirety of df, based on the columns ‘source’ and ‘target’ (df[1:2])
given that the the ‘id’ column in attrs is cross sectional with ‘source’ and ‘target’ in df, we can use V() to add attributes to the network and reference attrs in the as.character() method. V() works exactly like axis() for matplotlib in python.

# 1. load df into a graph and plot it
mynetwork = graph.data.frame(df[1:2], directed=FALSE)
plot(mynetwork, layout=layout.fruchterman.reingold, vertex.label="", vertex.size=1)

# add node attributes: Group || Activity || Weight 
V(mynetwork)$Group = as.character(attrs1$Group) # we can categorize the nodes by group

## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

V(mynetwork)$ACTIVITY = as.character(attrs1$ACTIVITY) # and we can also categorize them by activity

## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

V(mynetwork)$Weight = as.character(attrs1$weighted.herfindahl)

## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

V(mynetwork)$Subject = as.character(attrs1$primary.subject)

## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

# randomize 5 colors for the different groups
coul = brewer.pal(5, 'Set1')
my_color <- coul[as.numeric(as.factor(V(mynetwork)$Subject))]
# plot based on Subject
plot(mynetwork, vertex.color=my_color, vertex.label='', vertex.size=5, main='Colors Based on Subject')

Additional Data:

The Herfindahl Index: “is a commonly accepted measure of market concentration.” (Investopedia). In this case, our ‘market’ is the health sector, and a weighted Herfindahl index has been calculated for every node. These two can be crossvalidated with the ‘pi_key’ column in attrs & herfindahl. We can later add these weights to the nodes via the ‘weighted.herfindahl’ column in herfindahl. Finally, a weight of 0.18 or more is considered substantial market share, so we will highlight nodes with 0.18 or more in magenta, and nodes with a value less than 0.18 in green. For reference, 0.18 (or 18,000 points) is a ‘benchmark’ value in the US, which establishes the point at which merger laws can be considered.

In the case of the NIGMS data used for these networks, we can observe that most of the nodes are above the 0.18 (18,000 point) benchmark in magenta. This indicates that there is a very high degree of concentration in the publications that get published and co-authored.

#load df into a graph object
mynetwork1 = graph.data.frame(df[1:2], directed=FALSE)

# add node attributes: Group || Activity || Weight 
V(mynetwork1)$Group = as.character(attrs1$Group) # we can categorize the nodes by group

## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

V(mynetwork1)$ACTIVITY = as.character(attrs1$ACTIVITY) # and we can also categorize them by activity

## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

V(mynetwork1)$Weight = as.character(attrs1$weighted.herfindahl)

## Warning in vattrs[[name]][index] <- value: number of items to replace is
## not a multiple of replacement length

# color by Weight: >= 0.3 is magenta3, while < 0.3 is green3
V(mynetwork1)$color = ifelse(V(mynetwork1)$Weight >= 0.18, "magenta3", ifelse(V(mynetwork1)$Weight < 0.18, "green3", "white"))
plot(mynetwork1, layout=layout_nicely, vertex.label="", vertex.size=5, main='Colors Based on Herfindahl Weights')

week 1: familiarity with igraph

Sebastian Pastor Ferrari

4/27/2020

Data Cleanse…

Additional Data: