This is an example of using network analysis in R to analyze the legal spend relationship between the portfolio companies and the law firms. The main purpose is to identify the potential opportunity of leveraging the legal spend in order to negotiate a better deal.

Please note that this is an example of the consulting project from work. All figures and names have been modified for confidentiality.

Load the package

library(igraph)

Load datasets

nodes <- read.csv("Legal_NODES.csv", head = T, as.is = T)
edges <- read.csv("Legal_EDGES.csv", header=T, row.names=1, as.is=T)

Clean the datasets and look at the tables

edges[is.na(edges)] <- 0
head(nodes)
##   ID    Spend     Industry Industry.Type
## 1 C1  5368675 Manufacturer             1
## 2 C2 26381138 Manufacturer             1
## 3 C3  6754778 Manufacturer             1
## 4 C4  8210763     Consumer             2
## 5 C5 10394201 Manufacturer             1
## 6 C6 33797553     Consumer             2
head(edges)
##    C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20
## L1  0  0  0  0  0  1  1  0  0   0   0   0   0   0   0   1   0   0   0   0
## L2  0  0  0  0  1  1  1  1  0   0   0   0   0   0   0   1   0   0   0   0
## L3  0  0  0  0  0  1  0  0  0   0   0   0   0   0   0   0   0   0   0   1
## L4  0  0  0  0  1  1  0  0  0   0   0   0   0   0   0   0   0   0   0   0
## L5  0  0  0  1  0  0  0  0  0   0   0   0   0   0   0   0   1   1   0   0
## L6  0  0  0  0  0  1  0  0  0   0   0   0   0   0   0   0   0   0   0   0

As we can see the edges of the network are in matrix format and it is a bipartite network. So I will convert the matrix to igraph object and seperate it into two projects.

nets <- graph_from_incidence_matrix(edges)
nets.bp <- bipartite.projection(nets)
nets.1 <- nets.bp$proj1
nets.2 <- nets.bp$proj2
plot(nets.2,
     vertex.size = 7,
     vertex.label = NA,
     main = "Portfolio Companies Network")

It seems most of the porfolio companies connect quite equally. It means they are using same law firms as other companies except one which there is only one edge with another.

plot(nets.1,
     vertex.size = 7,
     vertex.label = NA,
     main = "Law Firm Network")

At the first glance, there might be a couple of clusters in the law firm network

I will add more attributes in the following analysis

Portfolio Companies Network

Now I will add total dollar amounts in legal spend, type of industry, and weight of the edges as the attributes to the portfolio company network.

Apply colors to seperate the comapnies in different industries

V(nets.2)$Industry.Type <- nodes$Industry.Type[nodes$Industry.Type != ""]
colrs <- c("gray50", "steelblue3")
V(nets.2)$color <- colrs[V(nets.2)$Industry.Type]

Make total legal spend as the relative size of the nodes

V(nets.2)$Spend <- nodes$Spend[nodes$Industry.Type != ""]
V(nets.2)$size <- V(nets.2)$Spend*0.0000008

Make the width of edges as the weight of the relationship

E(nets.2)$width <- E(nets.2)$weight

This network is still a little difficult to interpret since most the nodes are within one huge cluster. So now I will remove the edges with the weight less then the average of the entire network to give the network more seperation

cut.off <- mean(E(nets.2)$weight)
nets.2.sp <- delete_edges(nets.2, E(nets.2)[weight < cut.off*1])

Next we will try a couple different network layouts to see which one can give us the best visualization

layouts <- grep("^layout_", ls("package:igraph"), value=TRUE)[-1]
layouts <- layouts[!grepl("bipartite|merge|norm|sugiyama|tree", layouts)]

par(mfrow=c(2,2), mar=c(1.2,1.2,1.2,1.2))
for (layout in layouts) {
  l <- do.call(layout, list(nets.2.sp))
  plot(nets.2.sp, edge.arrow.mode=0, layout=l, main=layout, vertex.label = NA, vertex.frame.color = NA) 
}

By just looking at different layouts, I think Fruchterman-Reingold, multidimensional scaling, and nicely layouts give us better interpretation.

Next step I will culster the network using some of the community detection methods.

First to remove isolated nodes

nets.2.sp.iso <- delete_vertices(nets.2.sp, V(nets.2.sp)[degree(nets.2.sp) == 0])

WALKTRAP Method

This algorithm uses a series random walk to detect communities by assuming the vertices encountered more within given random walk that are more likely to be within same community.

nets.2.sp.iso.wt <- walktrap.community(nets.2.sp.iso, steps=200,modularity=TRUE)
par(mfrow=c(1,2), mar=c(1.2,1.2,1.2,1.2))
dendPlot(nets.2.sp.iso.wt, mode="hclust")
plot(nets.2.sp.iso.wt, nets.2.sp.iso, vertex.label = NA)

WALKTRP method give me 3 clusters.

Edge Betweenness Method

This algorithm use edge-betweenness to seperate communities. The idea is that the edges that connect seperate culsters are likely to have high edge-betweenness because all the shortest paths from one cluster to another must traverse through them.

nets.2.sp.iso.eb <- cluster_edge_betweenness(nets.2.sp.iso)
par(mfrow=c(1,2), mar=c(1.2,1.2,1.2,1.2))
dendPlot(nets.2.sp.iso.eb, mode="hclust")
plot(nets.2.sp.iso.eb, nets.2.sp.iso, vertex.label = NA)

4 clusters are identified using Edge Betweenness method.

Law Firm Network Analysis

Use legal expense per law firm as the relative size of the nodes and edge weight as the width of the edges.

V(nets.1)$Spend <- nodes$Spend[nodes$Industry.Type == ""]
V(nets.1)$size <- V(nets.2)$Spend*0.0000007
E(nets.1)$width <- E(nets.1)$weight
plot(nets.1, 
     vertex.label = NA,
     vertex.frame.color = "gray50",
     main = "Law Firm Network")

Remove the edges with less than average weight of the entire network

cut.off <- mean(E(nets.1)$weight)
nets.1.sp <- delete_edges(nets.1, E(nets.1)[weight < cut.off*1])

Again, we will try various layouts to see which one can explain the network the best

par(mfrow=c(2,2), mar=c(1.2,1.2,1.2,1.2))
for (layout in layouts) {
  l <- do.call(layout, list(nets.2.sp))
  plot(nets.1.sp, edge.arrow.mode=0, layout=l, main=layout, vertex.label = NA, vertex.frame.color = NA) 
}

Now we will see how these law firms are clustered. However, since there are way more vertices than the portfolio company network, I’ll look at the degree distribution first and then decide how many nodes to be removed to make the clusting readable

hist(degree(nets.1.sp, mode = "all"), breaks = 1:vcount(nets.1.sp)-1, xlim = c(0,15), main = "Histogram of Node Degree")

According to the historgm, most of the nodes have the degree of 1. So here, I will not only remove the isolated nodes but also any node with the degree less than 2.

nets.1.sp.iso <- delete_vertices(nets.1.sp, V(nets.1.sp)[degree(nets.2.sp) < 2])

WALKTRAP Method

nets.1.sp.iso.wt <- walktrap.community(nets.1.sp.iso, steps=200,modularity=TRUE)
par(mfrow=c(1,1), mar=c(1.2,1.2,1.2,1.2))
dendPlot(nets.1.sp.iso.wt, mode="hclust")

plot(nets.1.sp.iso.wt, nets.1.sp.iso, vertex.label = NA)

table(membership(nets.1.sp.iso.wt))
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
##  5 11  9  3  2  2  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1

As we can see there are 6 clusters with more than 1 law firm in it.

Edge Betweenness Method

nets.1.sp.iso.eb <- cluster_edge_betweenness(nets.1.sp.iso)
par(mfrow=c(1,1), mar=c(1.2,1.2,1.2,1.2))
dendPlot(nets.1.sp.iso.eb, mode="hclust")

plot(nets.1.sp.iso.eb, nets.1.sp.iso, vertex.label = NA)

table(membership(nets.1.sp.iso.eb))
## 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 
##  8  9  1  1  1  7  1  8  2  1  1  1  1  1  1  1  1  1

With Edge Betweeenness method, there are 5 clusters with more than 1 law firm in it.