This report provides an introductory summary to the formulation and application of exponential random graph models for the network of collaboration between countries. In these networks, nodes are countires and an edge between them is a joint paper. We use income level, cases level and coverage as node attributes.
At fist, lets have a look at the income level distribution among different antigens and differnt year intervals.
Since the node attributes for incidence level and coverage level have missing, we present this report only for income level.
The figures below displays a summary of number of nodes with specific income level as their node attributes.
We are interested in the effect of income level, cases level and coverage level in tie formation in networks for different antigens in different time intervals. Therefore, we use exponential random graphs, which model networks as a function of network statistics.
ERGMs imagine the observed network to be just one instantiation of a set of possible networks with similar features, that is, as the outcome of a stochastic process, which is unknown and must therefore be inferred. The observed network is the network data the we created and we are interested in modeling. The observed network is regarded as one realization from a set of possible networks with similar important characteristics (the same number of node, number of edges, same number of countries with H, UM, LM or L and ….), that is, as the outcome of some (unknown) stochastic process. In other words, the observed network is seen as one particular pattern of ties out of a large set of possible patterns. In general, we do not know what stochastic process generated the observed network, so in simple words, ERGM takes the observed network, and adds and removes edges, then sees how that changes the network. It uses those changes to the network to build an understanding of how all the terms in the model specification interact and affect the overall network. Our goal in formulating a model is to propose a plausible and theoretically principled hypothesis for this process.
Let \(Y\) denote an \(n \times n\) sociomatrix where \(y_{ij} = 1\) if individuals \(i\) and \(j\) have a tie. Let \(X\) denote a matrix of covariates, which includes structural measures of the network as well as nodal and possibly edge-level attributes. A generic ERGM can be written as:
\[ P_{\theta, \tilde{Y}} (Y = y|X) = \frac{exp (\theta ^T g(y,X))}{k(\theta , \tilde{Y})} \]
where \(\theta\) is a vector of coefficients, \(g(y,X)\) is a vector of sufficient statistics and \(\tilde{Y}\) is the space of all possible graphs, and \(k(\theta , \tilde{Y})\) is a normalizing constant. That is, it’s the numerator summed across all possible graphs \(\tilde{Y}\).The ergm equation can be re-written in terms of change statistics. The log-odds of a tie \(y_{ij}\) is:
\[ logit(Y_{i,j} = 1 | y_{i,j}^c) = \theta^T \delta(y_{i,j}) \] We use \(Y\) because we are looking for the random variable \(Y_{i,j}\) rather than the specific realization.
All ERG models and goodness‐of‐fit plots in this article were generated using ergm, a cornerstone of the statnet suite of packages for statistical network analysis (Handcock et al. 2003). All Models assume dyadic independence and thus can be calculated straightforwardly using pseudo‐likelihood estimation.
Because that is a dyadic-independent model (the likelihood of a tie doesn’t depend on any other), ergm solves the logistic regression instead of resorting to MCMC.
Note that the edges term represents exactly the density of the network (in log-odds). That is, the probability of any tie (aka the density of the network) is the inverse-logit of the coefficient on edges:
We start by building up from some basic terms first. the first term is the edges term which is a statistic which counts how many edges there are in the network (this not is not informative tho).
Coefficients in ERGMs represent the change in the (log-odds) likelihood of a tie for a unit change in a predictor. In order to be consistent with its standard errors we report the coefficients as log-odds. We show the coefficents value by circels, if its filled, then it means that its significant, if its hollow then it means that its not statistically significant.
Negative coefficients indicate that the formation of edges is less likely than would be expected by chance, while positive coefficients indicate a higher likelihood of edge formation. It is important to note that the edge term in any ERGM is almost always negative. In the simplest terms, this means that ties are not likely formed at random.
However the model with only number of edges is rarely a good model (so far we just understood that none of our networks have been made by chance), because as you add terms to the model, the model will have more explanatory power regarding the formation of ties ( this is also the reason that edge term decreases in the following models).
Now lets include also the information regarding each node, (i.e country). The idea of a nodal attributes is pretty straightforward, These are often what we would call (socio-demographic) attributes (e.g., income, geographic location, vaccination coverage level, …) in more standard regression models. In ERGM, we contribute these additional information in the form of node factor:
node factor is the number of times that nodes with a given attributes appear within the edge and it captures the propensity of nodes with a specific attribute to form ties, but it does not require both nodes in a tie to share the same attribute.
The node factor command is particularly useful since it allows to compare log-odds to a reference point (in our case is High income). This means each coefficient represents the difference in log odds of an edge existing between nodes of the specified income level compared to nodes with high income.
Focusing on before covid pandemic, for Measle, before pandemic, L has the lowest log-odds indicating a lower likelihood of an edge existing in the network for L, compared to LM and UM. Furthermore, the log odds of an edge existing in the network for L, compared to H, is -2.23719 . Post Pnademic,the log-odds of different income levels are closer to eachother, and they have increased in comparison to H but with the same order of likelihood.
While the likelihood of an edge existing in the Network for L is the least for both HPV and Influenza (Pre-Pandemic), the pattern is different for Measles and Polio.
For Polio, the difference in log-odds for L, UM and LM with H is the least (the values are close to 0), this means the similar likelihood of an edge existing in the network for L and H.
The plot below, shows all the coefficient and confidence intervals:
Looking at Polio, the node factor coefficients are not significant, except for the Log-odds to form a tie for L, which is less likeliy than coming from H pre-Covid, all the others are not significant.
Looking at HPV, all coefficients are signifcant and the log-odds of coming from UM in comparison to H is higher than LM and also lower than L which shows the dominance of H income in this network.
Looking at Measles, pre-Covid, the likelihood of tie formation for H, was higher than UM and UM higher than LM and LM higher than L. this pattern changes after Covid and is likelihood of tie formation for H is still higher thn UM (but less higher) and then then the likelihood of UM and LM are similar.
Looking at Influenza, LM and L are not signifcant. The likelihood for H and UM and LM are similar while the log-odds for tie formation in L is 2.5 less than log-odds for H.
Of greater interest from our point of view, the node match statistic counts the number of pairs of nodes of the same income that are members of the same board of ties. So we fit the model with number of edges and node match:
node match is a measure for homophily; the tendency of nodes with similar attributes to form ties with each other. It assesses whether ties are more likely to occur between nodes that share the same attribute vs. not having the same attribute.
Firs of all, all the coefficient are positive, indicating that having the same attribute value for both nodes in a dyad increases the likelihood of a tie for all antigens but the degree of such homophily varies across antigen. The lowest log-odds belong to Polio suggesting for cross-income tie formation (heterophily), or in better words “less” tendency for tie formation for countries with same income level.
Influenza shows a decrease in its homophily log-odds suggesting involving countries with different income level with time.
Apart from Polio, Measles and Influenza show a decrease in their homophily statistic post-pandemic.
Below you can see these measure with the same scale in a single figure for better comparison.
The plot is based on the model in which we include node Match (Homophily) for different antigens in different time intervals.
The Homophliy is always signifcant and Positive apart from the Polio at first time interval. The log-odds for a tie formation for same income country in all time intervals is more than 1.
Looking at Influenza, we see an increasing trend for heterophily with time, meaning opening to cross-income collaboration.
Looking at HPV, the log-odds for homophily stays around the same value (1.4) in all time periods.
Then, we fit the model with number of edges and node mix:
nodemix captures the propensity of nodes with different attribute values to form edges. It evaluates mixing patterns between different attribute levels, similar to what we have seen in Mixing Matrix.
For Polio, Almost all the combinations of ties are not statistically significant, suggesting that income node attribute is not as important as it si for other antigens. For Measles, reflecting what we have already seen in the previous plots(and also the mixing matrix), there is change in pattern for pre-post covid tie formation meaning that pre-covid, for example, the likelihood for L-UM or L-LM is -3 of odds of H-H and H-L or H-LM are -1 of odds of H-H, while the tredns changes post-pandemic.
For Influenza, we see an increase in collaboration between different combinations post-pandemic(reflecting the decreasing trend in homophily meaning welcoming collaboration cross-income-levels)
For HPV, the tie formation between different combination with respect to H-H is always lower.
The plot is based on the model in which we include node factor and node mix for different antigens in different time intervals. The plot below is for networks with self loops:
This plot is for networks without self loops:
Warning: Model statistics ‘nodefactor.income_group.Low income’, ‘nodefactor.income_group.Lower middle income’, and ‘nodefactor.income_group.Upper middle income’ are linear combinations of some set of preceding statistics at the current stage of the estimation. This may indicate that the model is nonidentifiable. Evaluating log-likelihood at the estimate.
Coverage data for Influenza is not available.
###
NodeFactor(Income, cases) + NodeMatch(Income, cases)
When we use ordinary least-squares regression, for example, we are probably used to calculating residuals, which are the difference between the observed and the predicted values for a specific value of the independent variable. While there is no simple analog to a residual in a linear model, we can ask whether our observed network is consistent with the family of networks implied by our estimated model parameters.
In problems for which maximum likelihood estimation, a troubling empirical fact has emerged: When ERGM parameters are estimated and a large number of networks are simulated from the resulting model, these networks frequently bear little resemblance to the observed network. This seemingly paradoxical fact arises because even though the MLE makes the probability of the observed network as large as possible, this probability still might be extremely small relative to other networks. In such a case, the ERGM does not fit the data well.
The blue points in the plot represent the mean of statistics in the simulated networks. The black line shows the observed statistics in the actual network
Below you can see the BIC for the above mentioned models, the start shows the least amount.
## Network Model BIC Sample
## 1 Measles_2010 edges 468.95405
## 2 Measles_2010 nodefactor_income 467.49214 *
## 3 Measles_2010 nodematch_income 474.57157
## 4 Measles_2010 nodemix_income 486.34462
## 5 Measles_2010 nodefactor_nodematch_income 468.88484
## 6 Measles_2010 nodefactor_income_continent 473.35426
## 7 Measles_2010 nodematch_income_continent 478.78833
## 8 Measles_2010 nodefactor_nodematch_income_continent 469.85913
## 9 Measles_2015 edges 1459.06366
## 10 Measles_2015 nodefactor_income 1449.31937
## 11 Measles_2015 nodematch_income 1442.60093
## 12 Measles_2015 nodemix_income 1472.37792
## 13 Measles_2015 nodefactor_nodematch_income 1444.44802
## 14 Measles_2015 nodefactor_income_continent 1472.77722
## 15 Measles_2015 nodematch_income_continent 1431.26387 *
## 16 Measles_2015 nodefactor_nodematch_income_continent 1441.46259
## 17 Measles_2020 edges 2101.49361
## 18 Measles_2020 nodefactor_income 1839.56656
## 19 Measles_2020 nodematch_income 1995.31686
## 20 Measles_2020 nodemix_income 1873.68046
## 21 Measles_2020 nodefactor_nodematch_income 1847.55842
## 22 Measles_2020 nodefactor_income_continent 1781.21872
## 23 Measles_2020 nodematch_income_continent 1933.64020
## 24 Measles_2020 nodefactor_nodematch_income_continent 1731.50546 *
## 25 HPV_2010 edges 847.44020
## 26 HPV_2010 nodefactor_income 837.66863
## 27 HPV_2010 nodematch_income 845.39814
## 28 HPV_2010 nodemix_income 863.51103
## 29 HPV_2010 nodefactor_nodematch_income 845.07330
## 30 HPV_2010 nodefactor_income_continent 847.46679
## 31 HPV_2010 nodematch_income_continent 831.38092 *
## 32 HPV_2010 nodefactor_nodematch_income_continent 834.45116
## 33 HPV_2015 edges 1413.72169
## 34 HPV_2015 nodefactor_income 1318.55284
## 35 HPV_2015 nodematch_income 1391.42989
## 36 HPV_2015 nodemix_income 1360.62519
## 37 HPV_2015 nodefactor_nodematch_income 1326.26208
## 38 HPV_2015 nodefactor_income_continent 1249.08199 *
## 39 HPV_2015 nodematch_income_continent 1399.10652
## 40 HPV_2015 nodefactor_nodematch_income_continent 1254.51886
## 41 HPV_2020 edges 2152.88442
## 42 HPV_2020 nodefactor_income 2054.87117
## 43 HPV_2020 nodematch_income 2129.68916
## 44 HPV_2020 nodemix_income 2094.59491
## 45 HPV_2020 nodefactor_nodematch_income 2063.31659
## 46 HPV_2020 nodefactor_income_continent 2055.27913
## 47 HPV_2020 nodematch_income_continent 2113.83945
## 48 HPV_2020 nodefactor_nodematch_income_continent 2031.12794 *
## 49 Polio_2010 edges 89.14123 *
## 50 Polio_2010 nodefactor_income 102.36863
## 51 Polio_2010 nodematch_income 91.47411
## 52 Polio_2010 nodemix_income 104.61067
## 53 Polio_2010 nodefactor_nodematch_income 103.79582
## 54 Polio_2010 nodefactor_income_continent 106.71876
## 55 Polio_2010 nodematch_income_continent 95.92637
## 56 Polio_2010 nodefactor_nodematch_income_continent 112.14626
## 57 Polio_2015 edges 522.23861 *
## 58 Polio_2015 nodefactor_income 530.64007
## 59 Polio_2015 nodematch_income 528.62608
## 60 Polio_2015 nodemix_income 557.79485
## 61 Polio_2015 nodefactor_nodematch_income 533.22460
## 62 Polio_2015 nodefactor_income_continent 528.68449
## 63 Polio_2015 nodematch_income_continent 534.77810
## 64 Polio_2015 nodefactor_nodematch_income_continent 536.11681
## 65 Polio_2020 edges 403.30161 *
## 66 Polio_2020 nodefactor_income 416.62922
## 67 Polio_2020 nodematch_income 408.56783
## 68 Polio_2020 nodemix_income 444.08538
## 69 Polio_2020 nodefactor_nodematch_income 422.62521
## 70 Polio_2020 nodefactor_income_continent 414.93989
## 71 Polio_2020 nodematch_income_continent 413.11540
## 72 Polio_2020 nodefactor_nodematch_income_continent 423.52671
## 73 Influenza_2010 edges 500.99522
## 74 Influenza_2010 nodefactor_income 484.72003
## 75 Influenza_2010 nodematch_income 490.90270
## 76 Influenza_2010 nodemix_income 482.05543
## 77 Influenza_2010 nodefactor_nodematch_income 489.21110
## 78 Influenza_2010 nodefactor_income_continent 472.48970 *
## 79 Influenza_2010 nodematch_income_continent 489.81906
## 80 Influenza_2010 nodefactor_nodematch_income_continent 474.33867
## 81 Influenza_2015 edges 1440.77192
## 82 Influenza_2015 nodefactor_income 1412.22075
## 83 Influenza_2015 nodematch_income 1417.50697
## 84 Influenza_2015 nodemix_income 1435.25546
## 85 Influenza_2015 nodefactor_nodematch_income 1415.00153
## 86 Influenza_2015 nodefactor_income_continent 1422.88580
## 87 Influenza_2015 nodematch_income_continent 1406.18546 *
## 88 Influenza_2015 nodefactor_nodematch_income_continent 1413.28855
## 89 Influenza_2020 edges 2872.41877
## 90 Influenza_2020 nodefactor_income 2853.04718
## 91 Influenza_2020 nodematch_income 2840.49848
## 92 Influenza_2020 nodemix_income 2854.74307
## 93 Influenza_2020 nodefactor_nodematch_income 2831.27091
## 94 Influenza_2020 nodefactor_income_continent 2835.91926
## 95 Influenza_2020 nodematch_income_continent 2778.16795
## 96 Influenza_2020 nodefactor_nodematch_income_continent 2727.61729 *
Models with coverage:
## Network Model BIC Sample
## 1 Measles_2010 edges 468.95405
## 2 Measles_2010 nodefactor_income 467.49214
## 3 Measles_2010 nodefactor_coverage 466.77634 *
## 4 Measles_2010 nodematch_income 474.57157
## 5 Measles_2010 nodematch_coverage 474.07673
## 6 Measles_2010 nodemix_income 486.34462
## 7 Measles_2010 nodefactor_nodematch_income 468.88484
## 8 Measles_2010 nodefactor_nodematch_coverage 473.56926
## 9 Measles_2010 nodefactor_income_continent 473.35426
## 10 Measles_2010 nodefactor_income_coverage 473.04830
## 11 Measles_2010 nodematch_income_continent 478.78833
## 12 Measles_2010 nodefactor_nodematch_income_continent 469.85913
## 13 Measles_2015 edges 1459.06366
## 14 Measles_2015 nodefactor_income 1449.31937
## 15 Measles_2015 nodefactor_coverage 1459.03914
## 16 Measles_2015 nodematch_income 1442.60093
## 17 Measles_2015 nodematch_coverage 1462.33491
## 18 Measles_2015 nodemix_income 1472.37792
## 19 Measles_2015 nodefactor_nodematch_income 1444.44802
## 20 Measles_2015 nodefactor_nodematch_coverage 1466.93048
## 21 Measles_2015 nodefactor_income_continent 1472.77722
## 22 Measles_2015 nodefactor_income_coverage 1432.91565
## 23 Measles_2015 nodematch_income_continent 1431.26387 *
## 24 Measles_2015 nodefactor_nodematch_income_continent 1441.46259
## 25 Measles_2020 edges 2101.49361
## 26 Measles_2020 nodefactor_income 1839.56656
## 27 Measles_2020 nodefactor_coverage 2031.58328
## 28 Measles_2020 nodematch_income 1995.31686
## 29 Measles_2020 nodematch_coverage 2103.93878
## 30 Measles_2020 nodemix_income 1873.68046
## 31 Measles_2020 nodefactor_nodematch_income 1847.55842
## 32 Measles_2020 nodefactor_nodematch_coverage 2039.59614
## 33 Measles_2020 nodefactor_income_continent 1781.21872
## 34 Measles_2020 nodefactor_income_coverage 1811.05990
## 35 Measles_2020 nodematch_income_continent 1933.64020
## 36 Measles_2020 nodefactor_nodematch_income_continent 1731.50546 *
## 37 HPV_2010 edges 847.44020
## 38 HPV_2010 nodefactor_income 837.66863
## 39 HPV_2010 nodefactor_coverage 832.36665
## 40 HPV_2010 nodematch_income 845.39814
## 41 HPV_2010 nodematch_coverage 843.79129
## 42 HPV_2010 nodemix_income 863.51103
## 43 HPV_2010 nodefactor_nodematch_income 845.07330
## 44 HPV_2010 nodefactor_nodematch_coverage 839.62482
## 45 HPV_2010 nodefactor_income_continent 847.46679
## 46 HPV_2010 nodefactor_income_coverage 836.15216
## 47 HPV_2010 nodematch_income_continent 831.38092 *
## 48 HPV_2010 nodefactor_nodematch_income_continent 834.45116
## 49 HPV_2015 edges 1413.72169
## 50 HPV_2015 nodefactor_income 1318.55284
## 51 HPV_2015 nodefactor_coverage 1319.31303
## 52 HPV_2015 nodematch_income 1391.42989
## 53 HPV_2015 nodematch_coverage 1415.92040
## 54 HPV_2015 nodemix_income 1360.62519
## 55 HPV_2015 nodefactor_nodematch_income 1326.26208
## 56 HPV_2015 nodefactor_nodematch_coverage 1326.86922
## 57 HPV_2015 nodefactor_income_continent 1249.08199 *
## 58 HPV_2015 nodefactor_income_coverage 1277.73870
## 59 HPV_2015 nodematch_income_continent 1399.10652
## 60 HPV_2015 nodefactor_nodematch_income_continent 1254.51886
## 61 HPV_2020 edges 2152.88442
## 62 HPV_2020 nodefactor_income 2054.87117
## 63 HPV_2020 nodefactor_coverage 1992.71463
## 64 HPV_2020 nodematch_income 2129.68916
## 65 HPV_2020 nodematch_coverage 2161.14050
## 66 HPV_2020 nodemix_income 2094.59491
## 67 HPV_2020 nodefactor_nodematch_income 2063.31659
## 68 HPV_2020 nodefactor_nodematch_coverage 1997.40352
## 69 HPV_2020 nodefactor_income_continent 2055.27913
## 70 HPV_2020 nodefactor_income_coverage 1899.22119 *
## 71 HPV_2020 nodematch_income_continent 2113.83945
## 72 HPV_2020 nodefactor_nodematch_income_continent 2031.12794
## 73 Polio_2010 edges 89.14123 *
## 74 Polio_2010 nodefactor_income 102.36863
## 75 Polio_2010 nodefactor_coverage 96.64291
## 76 Polio_2010 nodematch_income 91.47411
## 77 Polio_2010 nodematch_coverage 93.64494
## 78 Polio_2010 nodemix_income 104.61067
## 79 Polio_2010 nodefactor_nodematch_income 103.79582
## 80 Polio_2010 nodefactor_nodematch_coverage 101.07690
## 81 Polio_2010 nodefactor_income_continent 106.71876
## 82 Polio_2010 nodefactor_income_coverage 107.65410
## 83 Polio_2010 nodematch_income_continent 95.92637
## 84 Polio_2010 nodefactor_nodematch_income_continent 112.14626
## 85 Polio_2015 edges 522.23861
## 86 Polio_2015 nodefactor_income 530.64007
## 87 Polio_2015 nodefactor_coverage 508.91765 *
## 88 Polio_2015 nodematch_income 528.62608
## 89 Polio_2015 nodematch_coverage 528.21309
## 90 Polio_2015 nodemix_income 557.79485
## 91 Polio_2015 nodefactor_nodematch_income 533.22460
## 92 Polio_2015 nodefactor_nodematch_coverage 515.02584
## 93 Polio_2015 nodefactor_income_continent 528.68449
## 94 Polio_2015 nodefactor_income_coverage 510.37835
## 95 Polio_2015 nodematch_income_continent 534.77810
## 96 Polio_2015 nodefactor_nodematch_income_continent 536.11681
## 97 Polio_2020 edges 403.30161 *
## 98 Polio_2020 nodefactor_income 416.62922
## 99 Polio_2020 nodefactor_coverage 406.97169
## 100 Polio_2020 nodematch_income 408.56783
## 101 Polio_2020 nodematch_coverage 409.01769
## 102 Polio_2020 nodemix_income 444.08538
## 103 Polio_2020 nodefactor_nodematch_income 422.62521
## 104 Polio_2020 nodefactor_nodematch_coverage 412.49916
## 105 Polio_2020 nodefactor_income_continent 414.93989
## 106 Polio_2020 nodefactor_income_coverage 421.58601
## 107 Polio_2020 nodematch_income_continent 413.11540
## 108 Polio_2020 nodefactor_nodematch_income_continent 423.52671
Models with cases: