0. Reformatting data

As a first step, I put the data into a graph format, using the igraph package. At this exploratory, I am working with country-level data globally.

#2013 Graph 
#Create node dataset 
countries <- as.data.frame(cbind(unique(raw$sourcecountry), unique(raw$sourceiso)))
countries2 <- as.data.frame(cbind(unique(raw$hostcountry), unique(raw$hostiso)))
colnames(countries) <- c("country", "iso")
colnames(countries2) <- c("country", "iso")
countries <- rbind(countries, countries2)
countries <- as.data.frame(unique(countries)) 

nodes <- countries %>% select(iso)
colnames(nodes) <- c("nodes")


#Links: creating edges by aggregation
raw <- raw %>% mutate(link = 1)
links <- raw %>% 
  group_by(sourceiso, hostiso) %>% filter(year ==2013) %>% summarize(weight = sum(link), value = sum(capitalinvestment))  
##agnodes: adjust nodes for dropped countries 
agnodes <- nodes %>% filter(nodes %in% links$sourceiso | nodes %in% links$hostiso)
#Make igraph
net <- graph_from_data_frame(d=links, vertices = agnodes, directed = T)

#All time graph 
allnodes <- nodes
#Links: creating edges by aggregation
raw <- raw %>% mutate(link = 1)
alllinks <- raw %>% 
  group_by(sourceiso, hostiso) %>% summarize(weight = sum(link), value = sum(capitalinvestment))  
##agnodes: adjust nodes for dropped countries 
#Make igraph
allnet <- graph_from_data_frame(d=alllinks, vertices = allnodes, directed = T)

1. Visualizing with reduced data - at country level

2. Exploring the dynamics of some key network statistics

Now I look at some key network statistics over the period. The following graphs present dynamic evolution of measures of connectedness such as degrees and centrality.

While these measures fluctuate overtime, it is also apparent that the network is very much a tier-ed one, where the top tier persistently dominates over the period with little breakouts.

3. Distributions of network statistics overall

We can look closer at distributions of some key network characteristic. Below are frequency plots for three of these measures.

I look both at the distributions at the end of the period, as well as the period overall as a way to get a more stable measure.

4. More network statistics

Some more measures are presented in the table below with a similar logic of capturing the end-of-period picture and the overall more stable measures.

The first set of graphs represent degree distributions, and betweenness centrality measure in 2012. The second set represent the same measures over the whole period.

Additional statistics
Year Maximum in degree Maximum out degree Maximum all degree Density Transitivity Mutual Dyads Asymmetric dyads Unconnected dyads Highest authority score Highest hub score
2013 273 176 97 1 1 466 1417 13342 China U.S.
2003-2013 273 176 97 1 1 1573 3011 14722 China U.S.

5. Final comments

There is a lot more that we can look at. This exploratory exercise was intended mostly as a proof of concept for identifying the tools that are available in open-source and ways we can implement them for our dataset.