I. Getting to know the dataset

Welcome to Introduction to Social Network Analysis workshop! This file will introduce to you the concept of social network analysis and its application to international law research. The workshop is specifically designed for law students with some limited backgrounds in R. If you have no knowledge or no background in R, I highly recommend that you take some basic introductions to R, including working with vectors, lists, matrices, data wrangling, and data visualization. Hence, in taking this workshop, I assume that you have some of those basic knowledge of R.

The objectives of this workshop are:

  • to familiarize you with the concept of social network in a practical sense by working on a real data set;
  • to walk you through all steps necessary to create a network in R, as well as simple network visualizations;
  • to introduce basic features of a social network that provides insights into a relationship of interests;
  • to help you learn how to perform basic network analytics such as clustering and community detection; and
  • to introduce you to some advanced topics that may be of interest to you.

At the end of the workshop, you should be able to adapt what you have learned to your own legal research.

A. Setting up the workspace

First, create a new R Project on your computer. Make sure that you know where you save this R project. All of the outputs you have will be saved in this folder. Next, run the following lines of code to download the necessary R packages for this workshop.

#if any error message occurred, make sure you have installed these packages first before you load them
library(readxl) #to import data
library(tidyr) #reshaping date
## Warning: package 'tidyr' was built under R version 4.1.1
library(dplyr) #for data wrangling
## Warning: package 'dplyr' was built under R version 4.1.1
library(knitr) #to show table on RMarkdown
library(ggplot2) #for data visualization
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 4.1.1
library(igraph) #we will primarily use igraph package in this workshop

Second, you will have to import the data, which we will use throughout the workshop, which you can inspect here. The data set is from the DESTA (Design of Trade Agreements) Database, a project that contains comprehensive data on various types of preferential trade agreements (PTAs).

Now, you are ready to import the data to your RStudio workspace:

excel_file <- tempfile()
download.file(params$url,excel_file, mode = "wb")
pta_data <- read_excel(path = excel_file, sheet = 2)
pta_data$regioncon <- as.factor(pta_data$regioncon)
#inspect the data
str(pta_data) #look at the structure of the data
## tibble [18,329 × 16] (S3: tbl_df/tbl/data.frame)
##  $ country1      : chr [1:18329] "Afghanistan" "Algeria" "Algeria" "Algeria" ...
##  $ country2      : chr [1:18329] "India" "Egypt" "Ghana" "Guinea" ...
##  $ iso1          : num [1:18329] 4 12 12 12 12 12 818 818 818 818 ...
##  $ iso2          : num [1:18329] 356 818 288 324 466 504 288 324 466 504 ...
##  $ number        : chr [1:18329] "1" "2" "2" "2" ...
##  $ base_treaty   : num [1:18329] 1 2 2 2 2 2 2 2 2 2 ...
##  $ name          : chr [1:18329] "Afghanistan India" "African Common Market" "African Common Market" "African Common Market" ...
##  $ entry_type    : chr [1:18329] "base_treaty" "base_treaty" "base_treaty" "base_treaty" ...
##  $ consolidated  : num [1:18329] 0 0 0 0 0 0 0 0 0 0 ...
##  $ year          : num [1:18329] 2003 1962 1962 1962 1962 ...
##  $ entryforceyear: num [1:18329] 2003 1963 1963 1963 1963 ...
##  $ language      : chr [1:18329] "English" "English" "English" "English" ...
##  $ typememb      : num [1:18329] 1 2 2 2 2 2 2 2 2 2 ...
##  $ regioncon     : Factor w/ 6 levels "Africa","Americas",..: 3 1 1 1 1 1 1 1 1 1 ...
##  $ wto_listed    : num [1:18329] 1 1 1 1 1 1 1 1 1 1 ...
##  $ wto_name      : chr [1:18329] "India Afghanistan" "African Common Market" "African Common Market" "African Common Market" ...
head(pta_data) #look at the first few rows of the data
## # A tibble: 6 × 16
##   country1 country2  iso1  iso2 number base_treaty name  entry_type consolidated
##   <chr>    <chr>    <dbl> <dbl> <chr>        <dbl> <chr> <chr>             <dbl>
## 1 Afghani… India        4   356 1                1 Afgh… base_trea…            0
## 2 Algeria  Egypt       12   818 2                2 Afri… base_trea…            0
## 3 Algeria  Ghana       12   288 2                2 Afri… base_trea…            0
## 4 Algeria  Guinea      12   324 2                2 Afri… base_trea…            0
## 5 Algeria  Mali        12   466 2                2 Afri… base_trea…            0
## 6 Algeria  Morocco     12   504 2                2 Afri… base_trea…            0
## # … with 7 more variables: year <dbl>, entryforceyear <dbl>, language <chr>,
## #   typememb <dbl>, regioncon <fct>, wto_listed <dbl>, wto_name <chr>

B. Exploring the data

As you can see, the data we are dealing with is very large. Exploring data is crucial for quantitative approach to help us learn about the data, as well as to help us figure out what to do next with the data. We do so by trying to find some patterns from this data set first. Based on the data, two variables that could be of interest here are: year and the region to which a PTA belongs. A simple way to do this is to plot the data using ggplot2 package:

pta_data_count <- pta_data %>% group_by(year, regioncon) %>% count(name) #this line counts the frequency of ties between any two countries formed by year and region.
pta_data_count <- pta_data_count %>% arrange(desc(n))
kable(head(pta_data_count), col.names = c("year", "region", "agreement name", "number of ties"),
      caption = "Frequency of Ties Table")
Frequency of Ties Table
year region agreement name number of ties
1991 Africa African Economic Community 1275
2000 Intercontinental Cotonou Agreement 1140
1988 Intercontinental Global System of Trade Preferences (GSTP) 1128
2018 Africa African Continental Free Trade Area (AfCFTA) 946
1989 Intercontinental Lome IV 816
2003 Intercontinental Cotonou Agreement Cyprus Czech Republic Estonia Hungary Latvia Lithuania Malta Poland Slovakia Slovenia accession 760
kable(summary(pta_data_count), col.names = c("year", "region", "agreement name", "number of ties"), 
      caption = "A summary of the Frequency of Ties Data") #summary() provide a summary of the data
A summary of the Frequency of Ties Data
year region agreement name number of ties
Min. :1948 Africa : 91 Length:1116 Min. : 1.00
1st Qu.:1992 Americas :231 Class :character 1st Qu.: 1.00
Median :1999 Asia :140 Mode :character Median : 1.00
Mean :1997 Europe :260 NA Mean : 16.42
3rd Qu.:2006 Intercontinental:382 NA 3rd Qu.: 6.00
Max. :2021 Oceania : 12 NA Max. :1275.00

From a summary of data, notice that there are six different categories: Africa, Americas, Asia, Europe, Oceania, and Intercontinental. Most PTAs are intercontinental (382 agreements), while not surprisingly Oceania has only 12 intra-regional agreements. This is due to the number of member countries in the region.

Next, we are going to look at which countries tend to form multilateral PTAs and which countries tend to form bilateral PTAs:

#filter the new data
pta_data_count <- pta_data_count %>% mutate(bilat = ifelse(n == 1, "bilateral", "multilateral")) #creating a new variable to see if a tie belongs to a bilateral or multilateral treaty
pta_data_count$bilat <- as.factor(pta_data_count$bilat)
region_count <- pta_data_count %>% group_by(regioncon, bilat) %>% count(regioncon)
ggplot(region_count, aes(x= regioncon, y = n, color = bilat, fill = bilat)) +
  geom_col()

Out of this simple bar plot, we notice that countries in Africa prefer to have multilateral PTAs rather than bilateral ones, while countries in Americas and Asia clearly prefer to form bilateral PTAs instead of multilateral ones. Europe and Oceania are almost equally split between bilateral multilateral PTAs. Interestingly, the inter-regional PTAs are also ambivalent with regards to bilateral and multiplateral PTAs.

ggplot(pta_data_count, aes(x = year, y = n, color = regioncon, fill = regioncon)) +
  geom_col() + facet_wrap(~regioncon)

Indeed, plotting how many ties formed each year by regions confirm our hypothesis that Africa stands out with the number of ties formed, along with intra-regional PTAs, followed by Europe. Asia, Americas and Oceania prefers to form bilateral agreements.

II. Creating a network

Once we have preliminary explored the data, one of the directions we can take is to look at the inter-regional network because that is where PTAs have the most impact on inter-regional trade flows. What else could be further explored based on the data exploratory session? This is a food for thought, as well as for you to try social network analysis on your own.

In this section, I will introduce how to create a network in R. There are two types of data structure that you can setup to construct:

  1. Edgelist
  2. Adjacency Matrix

The edgelist is exactly what this DESTA database has done: the dyadic country relationship as shown in the first two coloumns, country1, and country2, the original data set here:

head(pta_data)
## # A tibble: 6 × 16
##   country1 country2  iso1  iso2 number base_treaty name  entry_type consolidated
##   <chr>    <chr>    <dbl> <dbl> <chr>        <dbl> <chr> <chr>             <dbl>
## 1 Afghani… India        4   356 1                1 Afgh… base_trea…            0
## 2 Algeria  Egypt       12   818 2                2 Afri… base_trea…            0
## 3 Algeria  Ghana       12   288 2                2 Afri… base_trea…            0
## 4 Algeria  Guinea      12   324 2                2 Afri… base_trea…            0
## 5 Algeria  Mali        12   466 2                2 Afri… base_trea…            0
## 6 Algeria  Morocco     12   504 2                2 Afri… base_trea…            0
## # … with 7 more variables: year <dbl>, entryforceyear <dbl>, language <chr>,
## #   typememb <dbl>, regioncon <fct>, wto_listed <dbl>, wto_name <chr>

Although to construct a simple network, you do not need other variables in the data set, it is useful to keep these variables to, as you will see later, set the tie properties in the network.

Alternatively, you can also construct a network from an adjacency matrix as the underlying data structure.

#ignore these two lines first, we will cover these functions later on
#the point is to show an alternative data structure available to construct a network show in the output here
temp_graph <- graph_from_edgelist(cbind(head(pta_data$country1), head(pta_data$country2)), directed = F)
get.adjacency(temp_graph)
## 8 x 8 sparse Matrix of class "dgCMatrix"
##             Afghanistan India Algeria Egypt Ghana Guinea Mali Morocco
## Afghanistan           .     1       .     .     .      .    .       .
## India                 1     .       .     .     .      .    .       .
## Algeria               .     .       .     1     1      1    1       1
## Egypt                 .     .       1     .     .      .    .       .
## Ghana                 .     .       1     .     .      .    .       .
## Guinea                .     .       1     .     .      .    .       .
## Mali                  .     .       1     .     .      .    .       .
## Morocco               .     .       1     .     .      .    .       .

A. From the edgelist

The function in igraph package, called graph_from_edgelist() will create a network from the edgelist data for you:

cbind(head(pta_data$country1), head(pta_data$country2))
##      [,1]          [,2]     
## [1,] "Afghanistan" "India"  
## [2,] "Algeria"     "Egypt"  
## [3,] "Algeria"     "Ghana"  
## [4,] "Algeria"     "Guinea" 
## [5,] "Algeria"     "Mali"   
## [6,] "Algeria"     "Morocco"
edge_net <- graph_from_edgelist(cbind(head(pta_data$country1), head(pta_data$country2)),
                                directed = F) #you have created the first network from edgelist!

#see the network:
set.seed(123) #set.seed() fixes the configuration of the plot for the sake of reproducibility and comparison in this case
plot.igraph(edge_net) #don't worry about this we will cover this function in the next section

B. From the adjacency matrix

To create a network from an adjacency matrix, call the function graph_from_adjacency_matrix() from the igraph package.

get.adjacency(edge_net)
## 8 x 8 sparse Matrix of class "dgCMatrix"
##             Afghanistan India Algeria Egypt Ghana Guinea Mali Morocco
## Afghanistan           .     1       .     .     .      .    .       .
## India                 1     .       .     .     .      .    .       .
## Algeria               .     .       .     1     1      1    1       1
## Egypt                 .     .       1     .     .      .    .       .
## Ghana                 .     .       1     .     .      .    .       .
## Guinea                .     .       1     .     .      .    .       .
## Mali                  .     .       1     .     .      .    .       .
## Morocco               .     .       1     .     .      .    .       .
matrix_net <- graph_from_adjacency_matrix(get.adjacency(edge_net), mode = "undirected")
set.seed(123) #again, set.seed() here to make sure the plot looks similar
plot.igraph(matrix_net)

As you can see, the two networks look exactly the same!

C. Construct the network of countries by their inter-regional PTAs

Now, let’s construct the network of countries by their inter-regional PTAs, using the edgelist data already provided in the DESTA data set. Because the data structure is an edgelist data, we use graph_from_edgelist().

pta_intercon <- pta_data %>% filter(regioncon == "Intercontinental")
pta_intercon_net <- graph_from_edgelist(cbind(pta_intercon$country1, pta_intercon$country2), directed = F)

Now that we have successfully created the network, the next step is to assign the value to the ties for further uses. Useful variables to be assigned for the purpose of this workshop are: name and year. The function E() calls an edge sequence of the network, as identified by the corresponding indices. The function can also be used to assign edge based attributes. We will use this to assign some properties of the edges and the weights of edges. Here, we call the edges of the network, using E() followed by $ and the name of the variable we wish to have, agt_name for the PTAs’ names, and year for the year of these PTAs.

E(pta_intercon_net)$agt_name <- pta_intercon$name
E(pta_intercon_net)$year <- pta_intercon$year

Note that there might be duplicated edges between a pair of countries, this is the case when the two countries have formed more than one PTA together. Instead of creating duplicated edges, we can assign a weight to each pair corresponding to the number of PTAs shared by the two countries. Next, using the simplify() function from the igraph package to simplify the network by removing loops and multiple edges. In doing so, simplify() can assign the sum of all duplicated edges to the weight of that edge.

#set weighted ties:
E(pta_intercon_net)$weight <- 1
pta_intercon_simp <- pta_intercon_net %>% simplify(edge.attr.comb = list(weight = "sum"))

It is also important to recognize that once we simplify the edges of the network, we will lose some information about name and year variables to those edges. The simplified network can be useful for the purpose of looking at the descriptives (to be covered in the next section). However, we the names or the years of PTAs are of importance to your analysis, you may have to revert back to the full network.

Similarly, you can also call the nodes of the network by using the function V() followed by the name of the node attribute. There are 191 country nodes engaged in inter-regional PTAs. We will be using this function a lot more later on.

head(V(pta_intercon_net)$name) #showing only the first 6 countries
## [1] "Egypt"   "Jordan"  "Morocco" "Tunisia" "Albania" "Turkey"

You can look at the basic information by calling a function summary(). As shown in the output here, the line shows IGRAPH, indicating the type of the object and followed by a unique code of the graph. After we have three capital letter, UNW. U indicates that this network is undirected graph, as opposed to D, a directed graph. N indicates that the vertex attribute, name, has been set. And, W indicates that this graph is weighted (with weight edge attributes). Another letter B is for bipartite graphs, the type vertex attributes.

After the two dashes, we have the number of vertices (191) and the number of edges (9316) reported. The second line reports a list of the attributes of the graph with the kind of the attribute, g for graph, v for vertex, and e for edge and the type of the attributes: c for character, n for numeric, l for logical, and x for other.

Call a function print_all() can show similar information but also the edges in the graph.

summary(pta_intercon_net) #print the number of vertices, edges,and whether the graph is directed or not
## IGRAPH 1db4168 UNW- 191 9316 -- 
## + attr: name (v/c), agt_name (e/c), year (e/n), weight (e/n)
#print_all(pta_intercon_net) #This line is left out to save some spaces from showing all of the edges

D. Construct the network of bilateral PTAs

pta_bilat <- pta_data %>% filter(typememb == 1)
pta_bilat_net <- graph_from_edgelist(cbind(pta_bilat$country1, pta_bilat$country2), directed = F)

E(pta_bilat_net)$agt_name <- pta_bilat$name
E(pta_bilat_net)$year <- pta_bilat$year

E(pta_bilat_net)$weight <- 1
pta_bilat_simp <- pta_bilat_net %>% simplify(edge.attr.comb = list(weight = "sum"))

III. Descriptives of the network

After we have successfully created the network, in this section, we will learn to explore some network properties – the descriptives of the network. First, we will look at how to visualize the network. Thereafter, the basic network descriptions such as density, degree distributions, and transitivity will be covered. Next, we will go over various types of network centrality to identify key players in the network.

A. Simple visualizations

The main function for network visualization in igraph package is plot.igraph() which allows you to customize your network visuals. There are many options available for you to customize your own network. Examples given here are vertex.size, vertex.color, vertex.label, vertex.label.cex, vertex.frame.color, and layout. Note that vertex. specifies specific shapes, color, and size of the vertex. Similarly, if you want to customize the edges, you can do so by calling edge., such as edge.color or edge.width to specify the thickness of the egdges. edge.width is often used for weighted graphs, showing different weights of the edges, as you shall see later on. For more specification, you can look it up here, an R documentation for plot.igraph().

Now, let’s plot two different networks here. First, we will plot the inter-regional PTA network which contains only inter-regional PTAs. Second, we will plot the bilateral PTA network which contains only bilateral PTAs, regardless of whether it is an inter-regional or a regional one.

set.seed(123)
plot.igraph(pta_intercon_net,
            vertex.size = 5,
            vertex.color = "red",
            vertex.label = V(pta_intercon_net)$name,
            vertex.label.cex = .5,
            vertex.frame.color = NA,
            layout = layout.fruchterman.reingold,
            main = "The network of inter-regional PTAs"
            )

#now try plotting the simplified bilateral PTAs, setting the ties with more than one PTA, in red and only one PTA in blue
E(pta_bilat_simp)$color <- ifelse(E(pta_bilat_simp)$weight > 1,'blue', 'grey')
plot.igraph(pta_bilat_simp,
            vertex.size = 5,
            vertex.color = "red",
            vekrtex.label = V(pta_bilat_simp)$name,
            vertex.label.cex = .5,
            vertex.frame.color = NA,
            layout = layout.fruchterman.reingold,
            edge.width = E(pta_bilat_simp)$weight,
            main = "The network of bilateral PTAs"
            )

Let’s zoom in to the bilateral PTA network first. Notice the blue lines, indicating that between that two countries, they have more than one bilateral PTAs. To further examine these PTAs, we can reduce the network to only country pairs with more than one bilateral PTAs with each other, using the ‘subgraph.edges()’ function.

#let's reduce the bilateral PTA network to only the country pairs with more than one bilateral PTAs
pta_bilat_2 <-  subgraph.edges(pta_bilat_simp,which(E(pta_bilat_simp)$weight > 1))
plot.igraph(pta_bilat_2,
            vertex.size = 5,
            vertex.color = "red",
            vekrtex.label = V(pta_bilat_2)$name,
            vertex.label.cex = .5,
            vertex.frame.color = NA,
            layout = layout.fruchterman.reingold,
            edge.width = E(pta_bilat_2)$weight,
            main = "Countries with more than one bilateral PTAs"
            )
Network Visualization

Network Visualization

Notice that mostly, these countries are neighboring countries. They tend to form treaties with one another with several versions.

On the other hand, the inter-regional PTA network is too dense to make much sense. However, you can spot certain clusters of countries, which will be further confirmed in the next sections. Many times, when we work with a huge data set, it is harder to get the sense of how the network has been developed. If you have specific interests or investigation to pursue, we can do so by inspecting sub-network, using the induce.subgraph() and subgraph.edge() function.

Let’s try first to separate the network into a multilateral network and a bilateral network. To do this, we use the variable typememb that comes with the original data set. According to the Codebook provided by DESTA, if a typememb variable equals to 1, then the underlying PTA forming that edge belongs to a bilateral PTA. As such, we assign another edge attribute, called bilateral to signify whether the edge belongs to a bilateral or multilateral PTA.

#Identify bilateral inter-regional PTAs
E(pta_intercon_net)$bilateral <- ifelse(pta_intercon$typememb == 1, 1,0)
bilateral_var <- as.data.frame(E(pta_intercon_net)$bilateral)
ggplot(bilateral_var, aes(x =E(pta_intercon_net)$bilateral)) +
  geom_bar(fill = "steelblue", color = "steelblue") +
  theme_minimal() +
  labs(x = "Bilateral/Multilateral Inter-Regional PTAs", y = "Count")

As clearly shown in the bar graph here, overwhelmingly the ties in the inter-regional PTA network are multilateral. Now, we will shift the focus to the network of bilateral inter-regional PTAs to find some insights, using the network analysis.

#bilateral inter-regional PTAs
pta_intercon_bilat <- subgraph.edges(pta_intercon_net,which(E(pta_intercon_net)$bilateral == 1))
plot(pta_intercon_bilat,
     vertex.size = 5,
     vertex.color = "red",
     vertex.label = V(pta_intercon_bilat)$name,
     vertex.label.cex = .5,
     vertex.frame.color = NA,
     layout = layout.fruchterman.reingold,
     main = "The network of bilateral inter-regional PTAs")

#for a cleaner visual
E(pta_intercon_bilat)$weight <- 1
pta_intercon_bilat_sim <- pta_intercon_bilat %>% simplify(edge.attr.comb = list(weight = "sum"))
plot(pta_intercon_bilat_sim,
     vertex.size = 5,
     vertex.color = "red",
     vertex.label = V(pta_intercon_bilat_sim)$name,
     vertex.label.cex = .5,
     vertex.frame.color = NA,
     edge.width = E(pta_intercon_bilat_sim)$weight,
     layout = layout.fruchterman.reingold,
     main = "The network of bilateral inter-regional PTAs (simplified)")

B. Basic descriptions of the network

degree_dist_bilat <- as.data.frame(sort(degree(pta_intercon_bilat), decreasing = T))
colnames(degree_dist_bilat) <- "degree"
ggplot(degree_dist_bilat, aes(x = degree))+
  geom_bar(fill = "steelblue", color = "steelblue") +
  theme_minimal()

bilat_density <- edge_density(pta_intercon_bilat)
bilat_transitivity <- transitivity(pta_intercon_bilat)

#for the sake of comparison
pta_density <- edge_density(pta_intercon_net)
pta_transitivity <- transitivity(pta_intercon_net)

bilat_density
## [1] 0.04924761
pta_density
## [1] 0.5134197
bilat_transitivity
## [1] 0.1098987
pta_transitivity
## [1] 0.3095602

C. Network Centralities

V(pta_intercon_bilat)$degree_cen <- degree(pta_intercon_bilat, normalized = T)
V(pta_intercon_bilat)$between_cen <- betweenness(pta_intercon_bilat, directed = F, normalized = T)
V(pta_intercon_bilat)$close_cen <- closeness(pta_intercon_bilat, normalized = T)
## Warning in closeness(pta_intercon_bilat, normalized = T): At
## centrality.c:2617 :closeness centrality is not well-defined for disconnected
## graphs
V(pta_intercon_bilat)$eigen_cen <- evcent(pta_intercon_bilat)$vector
V(pta_intercon_bilat)$pgrank_cen <- page_rank(pta_intercon_bilat, directed = F)$vector

centrality_mat <- cbind(V(pta_intercon_bilat)$degree_cen, 
                        V(pta_intercon_bilat)$between_cen,
                        V(pta_intercon_bilat)$close_cen,
                        V(pta_intercon_bilat)$eigen_cen,
                        V(pta_intercon_bilat)$pgrank_cen)

colnames(centrality_mat) <- c("degree", "betweenness", "closeness", "eigenvector", "page rank")
head(sort(degree(pta_intercon_bilat, normalized = T), decreasing = T))
##         Turkey United Kingdom         Israel         Jordan      Australia 
##      0.3058824      0.2352941      0.1882353      0.1529412      0.1529412 
##          Chile 
##      0.1529412
head(sort(betweenness(pta_intercon_bilat, directed = F, normalized = T), decreasing = T))
##         Turkey United Kingdom         Israel        Ukraine         Canada 
##     0.33147120     0.27980074     0.13115823     0.11926183     0.10435374 
##         Jordan 
##     0.09027493
head(sort(closeness(pta_intercon_bilat, normalized = T), decreasing = T))
## Warning in closeness(pta_intercon_bilat, normalized = T): At
## centrality.c:2617 :closeness centrality is not well-defined for disconnected
## graphs
## United Kingdom         Turkey         Jordan         Israel         Canada 
##      0.2522255      0.2485380      0.2354571      0.2354571      0.2348066 
##      Singapore 
##      0.2328767
head(sort(evcent(pta_intercon_bilat)$vector, decreasing = T))
## United Kingdom         Turkey      Australia         Jordan          Chile 
##      1.0000000      0.8259030      0.7644349      0.7589254      0.6657446 
##          Egypt 
##      0.6303492
head(sort(page_rank(pta_intercon_bilat, directed = F)$vector, decreasing = T))
##         Turkey United Kingdom         Israel          Chile      Australia 
##     0.07298401     0.04951986     0.04039618     0.02996861     0.02751608 
##  United States 
##     0.02742435

By looking at the list of countries with top centralities score, we are able to identify countries that consistently feature in all types of centralities: Turkey, United Kingdom, Israel, Jordan, Chile, Canada, and Australia. We can deduce from the results that these countries have prioritized cross-regional bilateral PTAs much more than the rest of the world.

centrality_corr <- as.data.frame(cor(centrality_mat)) 
centrality_corr <- cbind(centrality = rownames(centrality_corr), centrality_corr)
gather_centrality <- gather(centrality_corr, key = "centrality_measure", value = "correlation", -centrality)

ggplot(gather_centrality, aes(x=centrality, y = centrality_measure, fill = correlation)) +
  geom_tile() +
  scale_fill_distiller(palette = "PuBu", trans = "reverse") +
  xlab("") +
  ylab("")

set.seed(123)
plot(pta_intercon_bilat_sim,
     vertex.size = V(pta_intercon_bilat)$eigen_cen*10,
     vertex.color = "red",
     vertex.label = V(pta_intercon_bilat_sim)$name,
     vertex.label.cex = 0.5,
     vertex.frame.color = NA,
     edge.width = E(pta_intercon_bilat_sim)$weight,
     layout = layout.fruchterman.reingold,
     main = "The network of bilateral inter-regional PTAs (simplified)")
Revisiting Network Visualization with Centrality-Adjusted Vertex Sizes

Revisiting Network Visualization with Centrality-Adjusted Vertex Sizes

IV. Group and Community Detections

Community detection is one of the tools used in SNA to detect network structure – how different actors tend to group together by forming ties within the same community. A community is a group of densely connected actors. There are several algorithms to detect communities within a network. Here, we will cover four different approaches: component analysis, and cohesive blocking. Community detection is useful to discover actors with common attributes or common interests and understand what keeps them connected.

A. Component Analysis

Component analysis relies on the decompose.graph() function. This function provides, as an output, a list of separate networks by separating the original networks into components of connected subgraphs.

bilat_component_list <- decompose.graph(pta_intercon_bilat_sim)
V(bilat_component_list[[1]])$eigen_cen <- eigen_centrality(bilat_component_list[[1]])$vector
V(bilat_component_list[[2]])$eigen_cen <- eigen_centrality(bilat_component_list[[2]])$vector

plot(bilat_component_list[[1]],
     vertex.size = V(bilat_component_list[[1]])$eigen_cen*10,
     vertex.color = "red",
     vertex.label = V(bilat_component_list[[1]])$name,
     vertex.label.cex = 0.5,
     vertex.frame.color = NA,
     edge.width = E(bilat_component_list[[1]])$weight,
     layout = layout.fruchterman.reingold,
     main = "The network of bilateral inter-regional PTAs - Component 1")

plot(bilat_component_list[[2]],
     vertex.size = V(bilat_component_list[[2]])$eigen_cen*10,
     vertex.color = "red",
     vertex.label = V(bilat_component_list[[2]])$name,
     vertex.label.cex = 0.5,
     vertex.frame.color = NA,
     edge.width = E(bilat_component_list[[2]])$weight,
     layout = layout.fruchterman.reingold,
     main = "The network of bilateral inter-regional PTAs - Component 2")

B. Community Detection

Most of the time, a simple component analysis will not reveal much information. What we need is a more relaxed definition of groups. We want to find a way to detect groups despite some connections across different groups. There are many group detection algorithms available in the igraph package. Each tries to maximize certain values in separating nodes into different groups. The scope of each is beyond the scope of this workshop. However, I will try different types of community detection algorithm to show you how they work.

First, let’s try the Louvain method.

intercon_bilat_louvain <- cluster_louvain(pta_intercon_bilat_sim)
membership(intercon_bilat_louvain)
##                   Egypt                  Jordan                 Morocco 
##                       4                       4                       4 
##                 Tunisia                 Albania                  Turkey 
##                       4                       1                       1 
##                 Algeria                  France          United Kingdom 
##                       4                       4                       4 
##                Portugal                   Spain                  Sweden 
##                       1                       1                       7 
##                 Czechia                 Estonia                 Hungary 
##                       1                       3                       1 
##                  Latvia               Lithuania                  Poland 
##                       1                       1                       1 
##                Slovakia                Slovenia                Bulgaria 
##                       1                       1                       1 
##                 Romania                 Croatia                    Iraq 
##                       1                       1                       4 
##                   Syria                   Libya                 Armenia 
##                       4                       4                       3 
##                 Belarus                 Moldova                  Russia 
##                       3                       3                       3 
##                 Ukraine                   Kenya               Australia 
##                       3                       4                       2 
##               Indonesia                    Laos                Malaysia 
##                       2                       5                       2 
##               Singapore                Thailand                 Vietnam 
##                       2                       2                       5 
##             New Zealand                  Canada                   Chile 
##                       2                       1                       2 
##           United States              Azerbaijan                 Bahrain 
##                       5                       3                       5 
##              Kazakhstan              Kyrgyzstan              Tajikistan 
##                       3                       3                       3 
##    Bosnia & Herzegovina                  Israel                 Iceland 
##                       1                       1                       2 
##             Switzerland                   China                   India 
##                       2                       2                       2 
##                   Japan             South Korea                    Peru 
##                       2                       2                       2 
##                Colombia                 Georgia            Turkmenistan 
##                       1                       3                       3 
##              Uzbekistan           Côte d’Ivoire                Cameroon 
##                       3                       4                       4 
##                   Ghana               Mauritius                   Sudan 
##                       4                       1                       4 
##                Pakistan                 Lebanon                  Mexico 
##                       1                       4                       2 
##                  Kuwait                    Oman            Saudi Arabia 
##                       4                       5                       4 
##    United Arab Emirates Palestinian Territories                  Taiwan 
##                       4                       4                       6 
##                  Serbia               Sri Lanka               Guatemala 
##                       3                       2                       6 
##                Paraguay         North Macedonia              Montenegro 
##                       6                       1                       1 
##                  Panama               Greenland              Costa Rica 
##                       6                       7                       2 
##     Hong Kong SAR China                  Kosovo 
##                       2                       1
sizes(intercon_bilat_louvain)
## Community sizes
##  1  2  3  4  5  6  7 
## 23 18 14 20  5  4  2
modularity(intercon_bilat_louvain)
## [1] 0.5573765
plot(pta_intercon_bilat_sim,
     vertex.size = V(pta_intercon_bilat)$eigen_cen*10,
     vertex.color = "red",
     vertex.label = V(pta_intercon_bilat_sim)$name,
     vertex.label.cex = 0.5,
     vertex.frame.color = NA,
     edge.width = E(pta_intercon_bilat_sim)$weight,
     layout = layout.fruchterman.reingold,
     mark.groups = intercon_bilat_louvain,
     main = "Communities of Inter-Regional PTA Network")

Next, we will try the walktrap method, which attempts to find communities via random walk – which assumes that short random walk tend to stay close together.

intercon_bilat_walktrap <- cluster_walktrap(pta_intercon_bilat_sim)
membership(intercon_bilat_walktrap)
##                   Egypt                  Jordan                 Morocco 
##                       1                       1                       1 
##                 Tunisia                 Albania                  Turkey 
##                       4                       6                       6 
##                 Algeria                  France          United Kingdom 
##                       1                       4                       2 
##                Portugal                   Spain                  Sweden 
##                       2                       2                       9 
##                 Czechia                 Estonia                 Hungary 
##                       6                       6                       6 
##                  Latvia               Lithuania                  Poland 
##                       6                       6                       6 
##                Slovakia                Slovenia                Bulgaria 
##                       6                       6                       6 
##                 Romania                 Croatia                    Iraq 
##                       6                       6                       1 
##                   Syria                   Libya                 Armenia 
##                       4                       1                       8 
##                 Belarus                 Moldova                  Russia 
##                       8                       8                       8 
##                 Ukraine                   Kenya               Australia 
##                       8                       2                       3 
##               Indonesia                    Laos                Malaysia 
##                       3                       2                       3 
##               Singapore                Thailand                 Vietnam 
##                       3                       3                       2 
##             New Zealand                  Canada                   Chile 
##                       3                       2                       3 
##           United States              Azerbaijan                 Bahrain 
##                       2                       8                       2 
##              Kazakhstan              Kyrgyzstan              Tajikistan 
##                       8                       8                       8 
##    Bosnia & Herzegovina                  Israel                 Iceland 
##                       6                       6                       3 
##             Switzerland                   China                   India 
##                       3                       3                       3 
##                   Japan             South Korea                    Peru 
##                       3                       3                       3 
##                Colombia                 Georgia            Turkmenistan 
##                       6                       8                       8 
##              Uzbekistan           Côte d’Ivoire                Cameroon 
##                       8                       2                       2 
##                   Ghana               Mauritius                   Sudan 
##                       2                       5                       1 
##                Pakistan                 Lebanon                  Mexico 
##                       5                       1                       2 
##                  Kuwait                    Oman            Saudi Arabia 
##                       4                       2                       1 
##    United Arab Emirates Palestinian Territories                  Taiwan 
##                       1                       1                       7 
##                  Serbia               Sri Lanka               Guatemala 
##                       6                       3                       7 
##                Paraguay         North Macedonia              Montenegro 
##                       7                       6                       6 
##                  Panama               Greenland              Costa Rica 
##                       3                       9                       3 
##     Hong Kong SAR China                  Kosovo 
##                       3                       6
sizes(intercon_bilat_walktrap)
## Community sizes
##  1  2  3  4  5  6  7  8  9 
## 11 14 18  4  2 20  3 12  2
modularity(intercon_bilat_walktrap)
## [1] 0.5404012
plot(pta_intercon_bilat_sim,
     vertex.size = V(pta_intercon_bilat)$eigen_cen*10,
     vertex.color = "red",
     vertex.label = V(pta_intercon_bilat_sim)$name,
     vertex.label.cex = 0.5,
     vertex.frame.color = NA,
     edge.width = E(pta_intercon_bilat_sim)$weight,
     layout = layout.fruchterman.reingold,
     mark.groups = intercon_bilat_walktrap,
     main = "Communities of Inter-Regional PTA Network")