0. Reformatting data

As a first step, I put the data into a graph format, using the igraph package. At this exploratory, I am working with country-level data globally.

#2013 Graph 
#Create node dataset 
countries <- as.data.frame(cbind(unique(raw$sourcecountry), unique(raw$sourceiso)))
countries2 <- as.data.frame(cbind(unique(raw$hostcountry), unique(raw$hostiso)))
colnames(countries) <- c("country", "iso")
colnames(countries2) <- c("country", "iso")
countries <- rbind(countries, countries2)
countries <- as.data.frame(unique(countries)) 

nodes <- countries %>% select(iso)
colnames(nodes) <- c("nodes")


#Links: creating edges by aggregation
raw <- raw %>% mutate(link = 1)
links <- raw %>% 
  group_by(sourceiso, hostiso) %>% filter(year ==2013) %>% summarize(weight = sum(link), value = sum(capitalinvestment))  
##agnodes: adjust nodes for dropped countries 
agnodes <- nodes %>% filter(nodes %in% links$sourceiso | nodes %in% links$hostiso)
#Make igraph
net <- graph_from_data_frame(d=links, vertices = agnodes, directed = T)

#All time graph 
allnodes <- nodes
#Links: creating edges by aggregation
raw <- raw %>% mutate(link = 1)
alllinks <- raw %>% 
  group_by(sourceiso, hostiso) %>% summarize(weight = sum(link), value = sum(capitalinvestment))  
##agnodes: adjust nodes for dropped countries 
#Make igraph
allnet <- graph_from_data_frame(d=alllinks, vertices = allnodes, directed = T)

1. Visualizing with reduced data - at country level

The figures below show the greenfield FDI global network at different points in time. For illustration purposes, I am restricting the graph on the top-2 recipient countries for each exporting country. The arrow size increases with the number of projects that make up the flow. The node size reflects the in-degree of each country : ie. the number of nodes that it is connected to as a recipient.

2. Exploring the dynamics of some key network statistics

Now I look at some key network statistics over the period. The following graphs present dynamic evolution of measures of connectedness such as degrees and centrality.

While these measures fluctuate overtime, it is also apparent that the network is very much a tier-ed one, where the top tier persistently dominates over the period with little breakouts.

3. Distributions of network statistics overall

We can look closer at distributions of some key network characteristic. Below are frequency plots for three of these measures.

I look both at the distributions at the end of the period, as well as the period overall as a way to get a more stable measure.

4. More network statistics

Some more measures are presented in the table below with a similar logic of capturing the end-of-period picture and the overall more stable measures.

The first set of graphs represent degree distributions, and betweenness centrality measure in 2012. The second set represent the same measures over the whole period.

Additional statistics
Year	Maximum in degree	Maximum out degree	Maximum all degree	Density	Transitivity	Mutual Dyads	Asymmetric dyads	Unconnected dyads	Highest authority score	Highest hub score
2013	273	176	97	1	1	466	1417	13342	China	U.S.
2003-2013	273	176	97	1	1	1573	3011	14722	China	U.S.

5. Final comments

There is a lot more that we can look at. This exploratory exercise was intended mostly as a proof of concept for identifying the tools that are available in open-source and ways we can implement them for our dataset.

Greenfield FDI : A network exploration

Y.A.B.