This report provides an introductory summary to the formulation and application of exponential random graph models for the network of collaboration between countries. In these networks, nodes are countires and an edge between them is a joint paper. We use income level, cases level and coverage as node attributes.

At fist, lets have a look at the income level distribution among different antigens and differnt year intervals.

Since the node attributes for incidence level and coverage level have missing, we present this report only for income level.

Number of nodes for each income level and each antigen

The figures below displays a summary of number of nodes with specific income level as their node attributes.

We are interested in the effect of income level, cases level and coverage level in tie formation in networks for different antigens in different time intervals. Therefore, we use exponential random graphs, which model networks as a function of network statistics.

ERGM

ERGMs imagine the observed network to be just one instantiation of a set of possible networks with similar features, that is, as the outcome of a stochastic process, which is unknown and must therefore be inferred. The observed network is the network data the we created and we are interested in modeling. The observed network is regarded as one realization from a set of possible networks with similar important characteristics (the same number of node, number of edges, same number of countries with H, UM, LM or L and ….), that is, as the outcome of some (unknown) stochastic process. In other words, the observed network is seen as one particular pattern of ties out of a large set of possible patterns. In general, we do not know what stochastic process generated the observed network, so in simple words, ERGM takes the observed network, and adds and removes edges, then sees how that changes the network. It uses those changes to the network to build an understanding of how all the terms in the model specification interact and affect the overall network. Our goal in formulating a model is to propose a plausible and theoretically principled hypothesis for this process.

Let \(Y\) denote an \(n \times n\) sociomatrix where \(y_{ij} = 1\) if individuals \(i\) and \(j\) have a tie. Let \(X\) denote a matrix of covariates, which includes structural measures of the network as well as nodal and possibly edge-level attributes. A generic ERGM can be written as:

\[ P_{\theta, \tilde{Y}} (Y = y|X) = \frac{exp (\theta ^T g(y,X))}{k(\theta , \tilde{Y})} \]

where \(\theta\) is a vector of coefficients, \(g(y,X)\) is a vector of sufficient statistics and \(\tilde{Y}\) is the space of all possible graphs, and \(k(\theta , \tilde{Y})\) is a normalizing constant. That is, it’s the numerator summed across all possible graphs \(\tilde{Y}\).The ergm equation can be re-written in terms of change statistics. The log-odds of a tie \(y_{ij}\) is:

\[ logit(Y_{i,j} = 1 | y_{i,j}^c) = \theta^T \delta(y_{i,j}) \] We use \(Y\) because we are looking for the random variable \(Y_{i,j}\) rather than the specific realization.

All ERG models and goodness‐of‐fit plots in this article were generated using ergm, a cornerstone of the statnet suite of packages for statistical network analysis (Handcock et al. 2003). All Models assume dyadic independence and thus can be calculated straightforwardly using pseudo‐likelihood estimation.

Because that is a dyadic-independent model (the likelihood of a tie doesn’t depend on any other), ergm solves the logistic regression instead of resorting to MCMC.

Note that the edges term represents exactly the density of the network (in log-odds). That is, the probability of any tie (aka the density of the network) is the inverse-logit of the coefficient on edges:

Model 1: number of Edges

We start by building up from some basic terms first. the first term is the edges term which is a statistic which counts how many edges there are in the network (this not is not informative tho).

Coefficients in ERGMs represent the change in the (log-odds) likelihood of a tie for a unit change in a predictor. In order to be consistent with its standard errors we report the coefficients as log-odds. We show the coefficents value by circels, if its filled, then it means that its significant, if its hollow then it means that its not statistically significant.

Negative coefficients indicate that the formation of edges is less likely than would be expected by chance, while positive coefficients indicate a higher likelihood of edge formation. It is important to note that the edge term in any ERGM is almost always negative. In the simplest terms, this means that ties are not likely formed at random.

However the model with only number of edges is rarely a good model (so far we just understood that none of our networks have been made by chance), because as you add terms to the model, the model will have more explanatory power regarding the formation of ties ( this is also the reason that edge term decreases in the following models).

Model 2: number of Edges and node factor:

Node factor: Income

Now lets include also the information regarding each node, (i.e country). The idea of a nodal attributes is pretty straightforward, These are often what we would call (socio-demographic) attributes (e.g., income, geographic location, vaccination coverage level, …) in more standard regression models. In ERGM, we contribute these additional information in the form of node factor:

node factor is the number of times that nodes with a given attributes appear within the edge and it captures the propensity of nodes with a specific attribute to form ties, but it does not require both nodes in a tie to share the same attribute.

The node factor command is particularly useful since it allows to compare log-odds to a reference point (in our case is High income). This means each coefficient represents the difference in log odds of an edge existing between nodes of the specified income level compared to nodes with high income.

Focusing on before covid pandemic, for Measle, before pandemic, L has the lowest log-odds indicating a lower likelihood of an edge existing in the network for L, compared to LM and UM. Furthermore, the log odds of an edge existing in the network for L, compared to H, is -2.23719 . Post Pnademic,the log-odds of different income levels are closer to eachother, and they have increased in comparison to H but with the same order of likelihood.

While the likelihood of an edge existing in the Network for L is the least for both HPV and Influenza (Pre-Pandemic), the pattern is different for Measles and Polio.

For Polio, the difference in log-odds for L, UM and LM with H is the least (the values are close to 0), this means the similar likelihood of an edge existing in the network for L and H.

The plot below, shows all the coefficient and confidence intervals:

Looking at Polio, the node factor coefficients are not significant, except for the Log-odds to form a tie for L, which is less likeliy than coming from H pre-Covid, all the others are not significant.

Looking at HPV, all coefficients are signifcant and the log-odds of coming from UM in comparison to H is higher than LM and also lower than L which shows the dominance of H income in this network.

Looking at Measles, pre-Covid, the likelihood of tie formation for H, was higher than UM and UM higher than LM and LM higher than L. this pattern changes after Covid and is likelihood of tie formation for H is still higher thn UM (but less higher) and then then the likelihood of UM and LM are similar.

Looking at Influenza, LM and L are not signifcant. The likelihood for H and UM and LM are similar while the log-odds for tie formation in L is 2.5 less than log-odds for H.

Model3: NodeMatch(Homophiliy):

Income

Of greater interest from our point of view, the node match statistic counts the number of pairs of nodes of the same income that are members of the same board of ties. So we fit the model with number of edges and node match:

node match is a measure for homophily; the tendency of nodes with similar attributes to form ties with each other. It assesses whether ties are more likely to occur between nodes that share the same attribute vs. not having the same attribute.

Firs of all, all the coefficient are positive, indicating that having the same attribute value for both nodes in a dyad increases the likelihood of a tie for all antigens but the degree of such homophily varies across antigen. The lowest log-odds belong to Polio suggesting for cross-income tie formation (heterophily), or in better words “less” tendency for tie formation for countries with same income level.

Influenza shows a decrease in its homophily log-odds suggesting involving countries with different income level with time.

Apart from Polio, Measles and Influenza show a decrease in their homophily statistic post-pandemic.

Below you can see these measure with the same scale in a single figure for better comparison.

The plot is based on the model in which we include node Match (Homophily) for different antigens in different time intervals.

The Homophliy is always signifcant and Positive apart from the Polio at first time interval. The log-odds for a tie formation for same income country in all time intervals is more than 1.

Looking at Influenza, we see an increasing trend for heterophily with time, meaning opening to cross-income collaboration.

Looking at HPV, the log-odds for homophily stays around the same value (1.4) in all time periods.

Model 4: NodeMix:

Then, we fit the model with number of edges and node mix:

nodemix captures the propensity of nodes with different attribute values to form edges. It evaluates mixing patterns between different attribute levels, similar to what we have seen in Mixing Matrix.

For Polio, Almost all the combinations of ties are not statistically significant, suggesting that income node attribute is not as important as it si for other antigens. For Measles, reflecting what we have already seen in the previous plots(and also the mixing matrix), there is change in pattern for pre-post covid tie formation meaning that pre-covid, for example, the likelihood for L-UM or L-LM is -3 of odds of H-H and H-L or H-LM are -1 of odds of H-H, while the tredns changes post-pandemic.

For Influenza, we see an increase in collaboration between different combinations post-pandemic(reflecting the decreasing trend in homophily meaning welcoming collaboration cross-income-levels)

For HPV, the tie formation between different combination with respect to H-H is always lower.

More Models:

nodematch(“income_group”) + nodefactor(“income_group”)

The plot is based on the model in which we include node factor and node mix for different antigens in different time intervals. The plot below is for networks with self loops:

This plot is for networks without self loops:

nodefactor(“income_group”) + nodematch(“income_group”) + nodefactor(“location”) + nodematch(“location”):

nodemix(“income_group”) + nodefactor(“income_group”)

Warning: Model statistics ‘nodefactor.income_group.Low income’, ‘nodefactor.income_group.Lower middle income’, and ‘nodefactor.income_group.Upper middle income’ are linear combinations of some set of preceding statistics at the current stage of the estimation. This may indicate that the model is nonidentifiable. Evaluating log-likelihood at the estimate.

NodeFactor(Income) + NodeFactor(coverage)

Coverage data for Influenza is not available.

### NodeFactor(Income, cases) + NodeMatch(Income, cases)

edgewise shared partner:GWESP

Two nodes i and j have an edgewise shared partner when they are connected to each other and both i and j are also connected to a third individual k. If i and j were also connected to node l, then i and j would have two edgewise shared partners. In other words, when nodes have edgewise shared partnerships, they form triangles!

Adding one tie has a different effect on the number of edgewise shared partnerships in the network depending on the number of triangles that the tie closes, and the existing number of edgewise shared partnerships that the nodes involved in the triangles already belong to.

if a tie being modelled would not close a triangle, then after adding the tie, the nodes will still have the same number of edgewise shared partners, so the GWESP change statistic is zero.

Goodness-of-Fit

When we use ordinary least-squares regression, for example, we are probably used to calculating residuals, which are the difference between the observed and the predicted values for a specific value of the independent variable. While there is no simple analog to a residual in a linear model, we can ask whether our observed network is consistent with the family of networks implied by our estimated model parameters.

In problems for which maximum likelihood estimation, a troubling empirical fact has emerged: When ERGM parameters are estimated and a large number of networks are simulated from the resulting model, these networks frequently bear little resemblance to the observed network. This seemingly paradoxical fact arises because even though the MLE makes the probability of the observed network as large as possible, this probability still might be extremely small relative to other networks. In such a case, the ERGM does not fit the data well.

The blue points in the plot represent the mean of statistics in the simulated networks. The black line shows the observed statistics in the actual network

First model, with only edges

Second model: edges + nodefactor(“income_group”)

Third model: edges + nodematch(“income_group”)

Fourth model: edges + nodefactor(“income_group”) + nodematch(“income_group”)

Fifth mode: edges + nodemix(“income_group”)

Model Selection:

Below you can see the BIC for the above mentioned models, the start shows the least amount.

##           Network                                 Model        BIC Sample
## 1    Measles_2010                                 edges  468.95405       
## 2    Measles_2010                     nodefactor_income  467.49214      *
## 3    Measles_2010                      nodematch_income  474.57157       
## 4    Measles_2010                        nodemix_income  486.34462       
## 5    Measles_2010           nodefactor_nodematch_income  468.88484       
## 6    Measles_2010           nodefactor_income_continent  473.35426       
## 7    Measles_2010            nodematch_income_continent  478.78833       
## 8    Measles_2010 nodefactor_nodematch_income_continent  469.85913       
## 9    Measles_2015                                 edges 1459.06366       
## 10   Measles_2015                     nodefactor_income 1449.31937       
## 11   Measles_2015                      nodematch_income 1442.60093       
## 12   Measles_2015                        nodemix_income 1472.37792       
## 13   Measles_2015           nodefactor_nodematch_income 1444.44802       
## 14   Measles_2015           nodefactor_income_continent 1472.77722       
## 15   Measles_2015            nodematch_income_continent 1431.26387      *
## 16   Measles_2015 nodefactor_nodematch_income_continent 1441.46259       
## 17   Measles_2020                                 edges 2101.49361       
## 18   Measles_2020                     nodefactor_income 1839.56656       
## 19   Measles_2020                      nodematch_income 1995.31686       
## 20   Measles_2020                        nodemix_income 1873.68046       
## 21   Measles_2020           nodefactor_nodematch_income 1847.55842       
## 22   Measles_2020           nodefactor_income_continent 1781.21872       
## 23   Measles_2020            nodematch_income_continent 1933.64020       
## 24   Measles_2020 nodefactor_nodematch_income_continent 1731.50546      *
## 25       HPV_2010                                 edges  847.44020       
## 26       HPV_2010                     nodefactor_income  837.66863       
## 27       HPV_2010                      nodematch_income  845.39814       
## 28       HPV_2010                        nodemix_income  863.51103       
## 29       HPV_2010           nodefactor_nodematch_income  845.07330       
## 30       HPV_2010           nodefactor_income_continent  847.46679       
## 31       HPV_2010            nodematch_income_continent  831.38092      *
## 32       HPV_2010 nodefactor_nodematch_income_continent  834.45116       
## 33       HPV_2015                                 edges 1413.72169       
## 34       HPV_2015                     nodefactor_income 1318.55284       
## 35       HPV_2015                      nodematch_income 1391.42989       
## 36       HPV_2015                        nodemix_income 1360.62519       
## 37       HPV_2015           nodefactor_nodematch_income 1326.26208       
## 38       HPV_2015           nodefactor_income_continent 1249.08199      *
## 39       HPV_2015            nodematch_income_continent 1399.10652       
## 40       HPV_2015 nodefactor_nodematch_income_continent 1254.51886       
## 41       HPV_2020                                 edges 2152.88442       
## 42       HPV_2020                     nodefactor_income 2054.87117       
## 43       HPV_2020                      nodematch_income 2129.68916       
## 44       HPV_2020                        nodemix_income 2094.59491       
## 45       HPV_2020           nodefactor_nodematch_income 2063.31659       
## 46       HPV_2020           nodefactor_income_continent 2055.27913       
## 47       HPV_2020            nodematch_income_continent 2113.83945       
## 48       HPV_2020 nodefactor_nodematch_income_continent 2031.12794      *
## 49     Polio_2010                                 edges   89.14123      *
## 50     Polio_2010                     nodefactor_income  102.36863       
## 51     Polio_2010                      nodematch_income   91.47411       
## 52     Polio_2010                        nodemix_income  104.61067       
## 53     Polio_2010           nodefactor_nodematch_income  103.79582       
## 54     Polio_2010           nodefactor_income_continent  106.71876       
## 55     Polio_2010            nodematch_income_continent   95.92637       
## 56     Polio_2010 nodefactor_nodematch_income_continent  112.14626       
## 57     Polio_2015                                 edges  522.23861      *
## 58     Polio_2015                     nodefactor_income  530.64007       
## 59     Polio_2015                      nodematch_income  528.62608       
## 60     Polio_2015                        nodemix_income  557.79485       
## 61     Polio_2015           nodefactor_nodematch_income  533.22460       
## 62     Polio_2015           nodefactor_income_continent  528.68449       
## 63     Polio_2015            nodematch_income_continent  534.77810       
## 64     Polio_2015 nodefactor_nodematch_income_continent  536.11681       
## 65     Polio_2020                                 edges  403.30161      *
## 66     Polio_2020                     nodefactor_income  416.62922       
## 67     Polio_2020                      nodematch_income  408.56783       
## 68     Polio_2020                        nodemix_income  444.08538       
## 69     Polio_2020           nodefactor_nodematch_income  422.62521       
## 70     Polio_2020           nodefactor_income_continent  414.93989       
## 71     Polio_2020            nodematch_income_continent  413.11540       
## 72     Polio_2020 nodefactor_nodematch_income_continent  423.52671       
## 73 Influenza_2010                                 edges  500.99522       
## 74 Influenza_2010                     nodefactor_income  484.72003       
## 75 Influenza_2010                      nodematch_income  490.90270       
## 76 Influenza_2010                        nodemix_income  482.05543       
## 77 Influenza_2010           nodefactor_nodematch_income  489.21110       
## 78 Influenza_2010           nodefactor_income_continent  472.48970      *
## 79 Influenza_2010            nodematch_income_continent  489.81906       
## 80 Influenza_2010 nodefactor_nodematch_income_continent  474.33867       
## 81 Influenza_2015                                 edges 1440.77192       
## 82 Influenza_2015                     nodefactor_income 1412.22075       
## 83 Influenza_2015                      nodematch_income 1417.50697       
## 84 Influenza_2015                        nodemix_income 1435.25546       
## 85 Influenza_2015           nodefactor_nodematch_income 1415.00153       
## 86 Influenza_2015           nodefactor_income_continent 1422.88580       
## 87 Influenza_2015            nodematch_income_continent 1406.18546      *
## 88 Influenza_2015 nodefactor_nodematch_income_continent 1413.28855       
## 89 Influenza_2020                                 edges 2872.41877       
## 90 Influenza_2020                     nodefactor_income 2853.04718       
## 91 Influenza_2020                      nodematch_income 2840.49848       
## 92 Influenza_2020                        nodemix_income 2854.74307       
## 93 Influenza_2020           nodefactor_nodematch_income 2831.27091       
## 94 Influenza_2020           nodefactor_income_continent 2835.91926       
## 95 Influenza_2020            nodematch_income_continent 2778.16795       
## 96 Influenza_2020 nodefactor_nodematch_income_continent 2727.61729      *

Models with coverage:

##          Network                                 Model        BIC Sample
## 1   Measles_2010                                 edges  468.95405       
## 2   Measles_2010                     nodefactor_income  467.49214       
## 3   Measles_2010                   nodefactor_coverage  466.77634      *
## 4   Measles_2010                      nodematch_income  474.57157       
## 5   Measles_2010                    nodematch_coverage  474.07673       
## 6   Measles_2010                        nodemix_income  486.34462       
## 7   Measles_2010           nodefactor_nodematch_income  468.88484       
## 8   Measles_2010         nodefactor_nodematch_coverage  473.56926       
## 9   Measles_2010           nodefactor_income_continent  473.35426       
## 10  Measles_2010            nodefactor_income_coverage  473.04830       
## 11  Measles_2010            nodematch_income_continent  478.78833       
## 12  Measles_2010 nodefactor_nodematch_income_continent  469.85913       
## 13  Measles_2015                                 edges 1459.06366       
## 14  Measles_2015                     nodefactor_income 1449.31937       
## 15  Measles_2015                   nodefactor_coverage 1459.03914       
## 16  Measles_2015                      nodematch_income 1442.60093       
## 17  Measles_2015                    nodematch_coverage 1462.33491       
## 18  Measles_2015                        nodemix_income 1472.37792       
## 19  Measles_2015           nodefactor_nodematch_income 1444.44802       
## 20  Measles_2015         nodefactor_nodematch_coverage 1466.93048       
## 21  Measles_2015           nodefactor_income_continent 1472.77722       
## 22  Measles_2015            nodefactor_income_coverage 1432.91565       
## 23  Measles_2015            nodematch_income_continent 1431.26387      *
## 24  Measles_2015 nodefactor_nodematch_income_continent 1441.46259       
## 25  Measles_2020                                 edges 2101.49361       
## 26  Measles_2020                     nodefactor_income 1839.56656       
## 27  Measles_2020                   nodefactor_coverage 2031.58328       
## 28  Measles_2020                      nodematch_income 1995.31686       
## 29  Measles_2020                    nodematch_coverage 2103.93878       
## 30  Measles_2020                        nodemix_income 1873.68046       
## 31  Measles_2020           nodefactor_nodematch_income 1847.55842       
## 32  Measles_2020         nodefactor_nodematch_coverage 2039.59614       
## 33  Measles_2020           nodefactor_income_continent 1781.21872       
## 34  Measles_2020            nodefactor_income_coverage 1811.05990       
## 35  Measles_2020            nodematch_income_continent 1933.64020       
## 36  Measles_2020 nodefactor_nodematch_income_continent 1731.50546      *
## 37      HPV_2010                                 edges  847.44020       
## 38      HPV_2010                     nodefactor_income  837.66863       
## 39      HPV_2010                   nodefactor_coverage  832.36665       
## 40      HPV_2010                      nodematch_income  845.39814       
## 41      HPV_2010                    nodematch_coverage  843.79129       
## 42      HPV_2010                        nodemix_income  863.51103       
## 43      HPV_2010           nodefactor_nodematch_income  845.07330       
## 44      HPV_2010         nodefactor_nodematch_coverage  839.62482       
## 45      HPV_2010           nodefactor_income_continent  847.46679       
## 46      HPV_2010            nodefactor_income_coverage  836.15216       
## 47      HPV_2010            nodematch_income_continent  831.38092      *
## 48      HPV_2010 nodefactor_nodematch_income_continent  834.45116       
## 49      HPV_2015                                 edges 1413.72169       
## 50      HPV_2015                     nodefactor_income 1318.55284       
## 51      HPV_2015                   nodefactor_coverage 1319.31303       
## 52      HPV_2015                      nodematch_income 1391.42989       
## 53      HPV_2015                    nodematch_coverage 1415.92040       
## 54      HPV_2015                        nodemix_income 1360.62519       
## 55      HPV_2015           nodefactor_nodematch_income 1326.26208       
## 56      HPV_2015         nodefactor_nodematch_coverage 1326.86922       
## 57      HPV_2015           nodefactor_income_continent 1249.08199      *
## 58      HPV_2015            nodefactor_income_coverage 1277.73870       
## 59      HPV_2015            nodematch_income_continent 1399.10652       
## 60      HPV_2015 nodefactor_nodematch_income_continent 1254.51886       
## 61      HPV_2020                                 edges 2152.88442       
## 62      HPV_2020                     nodefactor_income 2054.87117       
## 63      HPV_2020                   nodefactor_coverage 1992.71463       
## 64      HPV_2020                      nodematch_income 2129.68916       
## 65      HPV_2020                    nodematch_coverage 2161.14050       
## 66      HPV_2020                        nodemix_income 2094.59491       
## 67      HPV_2020           nodefactor_nodematch_income 2063.31659       
## 68      HPV_2020         nodefactor_nodematch_coverage 1997.40352       
## 69      HPV_2020           nodefactor_income_continent 2055.27913       
## 70      HPV_2020            nodefactor_income_coverage 1899.22119      *
## 71      HPV_2020            nodematch_income_continent 2113.83945       
## 72      HPV_2020 nodefactor_nodematch_income_continent 2031.12794       
## 73    Polio_2010                                 edges   89.14123      *
## 74    Polio_2010                     nodefactor_income  102.36863       
## 75    Polio_2010                   nodefactor_coverage   96.64291       
## 76    Polio_2010                      nodematch_income   91.47411       
## 77    Polio_2010                    nodematch_coverage   93.64494       
## 78    Polio_2010                        nodemix_income  104.61067       
## 79    Polio_2010           nodefactor_nodematch_income  103.79582       
## 80    Polio_2010         nodefactor_nodematch_coverage  101.07690       
## 81    Polio_2010           nodefactor_income_continent  106.71876       
## 82    Polio_2010            nodefactor_income_coverage  107.65410       
## 83    Polio_2010            nodematch_income_continent   95.92637       
## 84    Polio_2010 nodefactor_nodematch_income_continent  112.14626       
## 85    Polio_2015                                 edges  522.23861       
## 86    Polio_2015                     nodefactor_income  530.64007       
## 87    Polio_2015                   nodefactor_coverage  508.91765      *
## 88    Polio_2015                      nodematch_income  528.62608       
## 89    Polio_2015                    nodematch_coverage  528.21309       
## 90    Polio_2015                        nodemix_income  557.79485       
## 91    Polio_2015           nodefactor_nodematch_income  533.22460       
## 92    Polio_2015         nodefactor_nodematch_coverage  515.02584       
## 93    Polio_2015           nodefactor_income_continent  528.68449       
## 94    Polio_2015            nodefactor_income_coverage  510.37835       
## 95    Polio_2015            nodematch_income_continent  534.77810       
## 96    Polio_2015 nodefactor_nodematch_income_continent  536.11681       
## 97    Polio_2020                                 edges  403.30161      *
## 98    Polio_2020                     nodefactor_income  416.62922       
## 99    Polio_2020                   nodefactor_coverage  406.97169       
## 100   Polio_2020                      nodematch_income  408.56783       
## 101   Polio_2020                    nodematch_coverage  409.01769       
## 102   Polio_2020                        nodemix_income  444.08538       
## 103   Polio_2020           nodefactor_nodematch_income  422.62521       
## 104   Polio_2020         nodefactor_nodematch_coverage  412.49916       
## 105   Polio_2020           nodefactor_income_continent  414.93989       
## 106   Polio_2020            nodefactor_income_coverage  421.58601       
## 107   Polio_2020            nodematch_income_continent  413.11540       
## 108   Polio_2020 nodefactor_nodematch_income_continent  423.52671

Models with cases: