Lab 4

Preparing the environment, load data, explore

install.packages("statnet")

Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
(as 'lib' is unspecified)

library(statnet)

Loading required package: tergm

Loading required package: ergm

Loading required package: network


'network' 1.18.2 (2023-12-04), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information


'ergm' 4.6.0 (2023-12-17), part of the Statnet Project
* 'news(package="ergm")' for changes since last version
* 'citation("ergm")' for citation information
* 'https://statnet.org' for help, support, and other information

'ergm' 4 is a major update that introduces some backwards-incompatible
changes. Please type 'news(package="ergm")' for a list of major
changes.

Loading required package: networkDynamic


'networkDynamic' 0.11.4 (2023-12-10?), part of the Statnet Project
* 'news(package="networkDynamic")' for changes since last version
* 'citation("networkDynamic")' for citation information
* 'https://statnet.org' for help, support, and other information

Registered S3 method overwritten by 'tergm':
  method                   from
  simulate_formula.network ergm


'tergm' 4.2.0 (2023-05-30), part of the Statnet Project
* 'news(package="tergm")' for changes since last version
* 'citation("tergm")' for citation information
* 'https://statnet.org' for help, support, and other information


Attaching package: 'tergm'

The following object is masked from 'package:ergm':

    snctrl

Loading required package: ergm.count


'ergm.count' 4.1.1 (2022-05-24), part of the Statnet Project
* 'news(package="ergm.count")' for changes since last version
* 'citation("ergm.count")' for citation information
* 'https://statnet.org' for help, support, and other information

Loading required package: sna

Loading required package: statnet.common


Attaching package: 'statnet.common'

The following object is masked from 'package:ergm':

    snctrl

The following objects are masked from 'package:base':

    attr, order

sna: Tools for Social Network Analysis
Version 2.7-2 created on 2023-12-05.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

Loading required package: tsna


'statnet' 2019.6 (2019-06-13), part of the Statnet Project
* 'news(package="statnet")' for changes since last version
* 'citation("statnet")' for citation information
* 'https://statnet.org' for help, support, and other information

edges <- read.csv("/cloud/project/edgelist_retrieve (1).csv")
edges

attributes <- read.csv("/cloud/project/att_expertise (1).csv")
attributes

   id  expertise
1   1 0.17647059
2   2 0.17647059
3   3 0.17647059
4   4 0.52941176
5   5 0.52941176
6   6 0.58823529
7   7 0.05882353
8   8 0.29411765
9   9 0.58823529
10 10 0.05882353
11 11 0.17647059
12 12 0.05882353
13 13 0.05882353
14 14 0.35294118
15 15 0.23529412
16 16 0.58823529
17 17 0.41176471

retrieve_net <- network(x = edges,
 directed = T,
 vertex.attr = attributes)

plot(retrieve_net)

Estimation

Build Model

set.seed(123)
model.retrieve <- ergm(retrieve_net ~ edges + mutual + nodeicov("expertise") + nodeocov("expertise"))

Starting maximum pseudolikelihood estimation (MPLE):

Obtaining the responsible dyads.

Evaluating the predictor and response matrix.

Maximizing the pseudolikelihood.

Finished MPLE.

Starting Monte Carlo maximum likelihood estimation (MCMLE):

Iteration 1 of at most 60:

Warning: 'glpk' selected as the solver, but package 'Rglpk' is not available;
falling back to 'lpSolveAPI'. This should be fine unless the sample size and/or
the number of parameters is very big.

Optimizing with step length 1.0000.

The log-likelihood improved by 0.0137.

Convergence test p-value: 0.0002. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.

This model was fit using MCMC.  To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.

Interpret Results

summary(model.retrieve)

Call:
ergm(formula = retrieve_net ~ edges + mutual + nodeicov("expertise") + 
    nodeocov("expertise"))

Monte Carlo Maximum Likelihood Results:

                   Estimate Std. Error MCMC % z value Pr(>|z|)    
edges               -2.2479     0.4456      0  -5.045   <1e-04 ***
mutual              -0.1087     0.6071      0  -0.179   0.8579    
nodeicov.expertise   2.0333     0.9063      0   2.244   0.0249 *  
nodeocov.expertise   0.8309     0.8934      0   0.930   0.3524    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 291.1  on 210  degrees of freedom
 Residual Deviance: 202.0  on 206  degrees of freedom
 
AIC: 210  BIC: 223.4  (Smaller is better. MC Std. Err. = 0.0144)

Model Diagnostics and Goodness of Fit

#MCMC Convergence
pdf("modeldiagnostics.pdf")
mcmc.diagnostics(model.retrieve)

Sample statistics summary:

Iterations = 14336:262144
Thinning interval = 1024 
Number of chains = 1 
Sample size per chain = 243 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                     Mean    SD Naive SE Time-series SE
edges              0.8477 5.613   0.3601         0.2854
mutual             0.1523 2.034   0.1305         0.1305
nodeicov.expertise 0.2225 2.109   0.1353         0.1221
nodeocov.expertise 0.1784 1.911   0.1226         0.1077

2. Quantiles for each variable:

                     2.5%    25%        50%   75%  97.5%
edges              -9.000 -3.000  1.000e+00 5.000 11.950
mutual             -3.000 -1.000  0.000e+00 1.000  5.000
nodeicov.expertise -3.353 -1.294  1.176e-01 1.735  4.526
nodeocov.expertise -3.053 -1.147 -2.000e-09 1.529  4.232


Are sample statistics significantly different from observed?
                 edges    mutual nodeicov.expertise nodeocov.expertise
diff.      0.847736626 0.1522634         0.22246430         0.17840717
test stat. 2.970622021 1.1668359         1.82268154         1.65612493
P-val.     0.002971973 0.2432766         0.06835164         0.09769652
              (Omni)
diff.             NA
test stat. 6.5712426
P-val.     0.1693739

Sample statistics cross-correlations:
                       edges    mutual nodeicov.expertise nodeocov.expertise
edges              1.0000000 0.5723729          0.8466874          0.8014552
mutual             0.5723729 1.0000000          0.5104470          0.5103168
nodeicov.expertise 0.8466874 0.5104470          1.0000000          0.6340906
nodeocov.expertise 0.8014552 0.5103168          0.6340906          1.0000000

Sample statistics auto-correlation:
Chain 1 
               edges      mutual nodeicov.expertise nodeocov.expertise
Lag 0     1.00000000  1.00000000         1.00000000        1.000000000
Lag 1024 -0.11954449  0.06809777        -0.10466554       -0.130413824
Lag 2048 -0.10012999 -0.06374528        -0.06913897       -0.041351693
Lag 3072  0.04567336  0.03210058         0.07169223        0.079082995
Lag 4096  0.07151648 -0.08545746         0.04886728       -0.051960509
Lag 5120 -0.06048858 -0.08493815        -0.09172259        0.001945236

Sample statistics burn-in diagnostic (Geweke):
Chain 1 

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

             edges             mutual nodeicov.expertise nodeocov.expertise 
        -1.0369435         -0.1906664         -1.8878406          0.4235794 

Individual P-values (lower = worse):
             edges             mutual nodeicov.expertise nodeocov.expertise 
        0.29976217         0.84878693         0.05904736         0.67187262 
Joint P-value (lower = worse):  0.5354552 

Note: MCMC diagnostics shown here are from the last round of
  simulation, prior to computation of final parameter estimates.
  Because the final estimates are refinements of those used for this
  simulation run, these diagnostics may understate model performance.
  To directly assess the performance of the final model on in-model
  statistics, please use the GOF command: gof(ergmFitObject,
  GOF=~model).

dev.off()

png 
  2

#Goodness of Fit
model.retrieve.gof <- gof(model.retrieve)
print(model.retrieve.gof)


Goodness-of-fit for in-degree 

          obs min mean max MC p-value
idegree0    7   0 0.89   4       0.00
idegree1    3   0 2.76   6       1.00
idegree2    0   0 4.00  10       0.08
idegree3    0   0 3.23   7       0.06
idegree4    0   0 1.98   6       0.16
idegree5    2   0 1.20   4       0.72
idegree6    0   0 0.57   3       1.00
idegree7    0   0 0.22   2       1.00
idegree8    1   0 0.10   3       0.16
idegree9    1   0 0.05   1       0.10
idegree11   1   0 0.00   0       0.00

Goodness-of-fit for out-degree 

         obs min mean max MC p-value
odegree0   2   0 0.69   3       0.34
odegree1   1   0 2.53   6       0.48
odegree2   3   0 3.86   7       0.82
odegree3   4   0 3.84   7       1.00
odegree4   3   0 2.41   7       0.82
odegree5   2   0 1.04   5       0.56
odegree6   0   0 0.46   3       1.00
odegree7   0   0 0.15   1       1.00
odegree8   0   0 0.02   1       1.00

Goodness-of-fit for edgewise shared partner 

         obs min  mean max MC p-value
esp.OTP0  10  14 23.51  31       0.00
esp.OTP1  13   2 12.86  24       1.00
esp.OTP2  11   0  3.44  16       0.10
esp.OTP3   5   0  0.72   6       0.06
esp.OTP4   2   0  0.04   1       0.00
esp.OTP5   0   0  0.01   1       1.00

Goodness-of-fit for minimum geodesic distance 

    obs min  mean max MC p-value
1    41  29 40.58  54       0.96
2    19  36 65.99  96       0.00
3     4  19 48.38  71       0.00
4     0   1 20.11  45       0.00
5     0   0  6.25  22       0.32
6     0   0  1.76  11       1.00
7     0   0  0.43   7       1.00
8     0   0  0.06   4       1.00
9     0   0  0.01   1       1.00
Inf 146   0 26.43 116       0.00

Goodness-of-fit for model statistics 

                        obs       min     mean      max MC p-value
edges              41.00000 29.000000 40.58000 54.00000       0.96
mutual              4.00000  0.000000  3.81000  9.00000       1.00
nodeicov.expertise 13.52941  9.235294 13.49529 20.47059       0.98
nodeocov.expertise 11.82353  6.588235 11.73647 17.23529       0.90

#Plot Goodness of Fit
par(mfrow = c(2, 2))
plot(model.retrieve.gof)

Interpreting Results

H1: Members tend to reciprocate information retrieval ties.

The mutual parameter, which represents reciprocated ties, has an estimate of -0.1087 with a standard error of 0.6071. The z-value is -0.179, and the p-value is 0.8579. Since the p-value is greater than 0.05, we fail to reject the null hypothesis. Therefore, we do not have sufficient evidence to conclude that members tend to reciprocate information retrieval ties.

H2: Members tend to retrieve information from other members with high expertise.

The nodeicov.expertise parameter, which represents the tendency to form ties based on the expertise of the receiving member, has an estimate of 2.0333 with a standard error of 0.9063. The z-value is 2.244, and the p-value is 0.0249. Since the p-value is less than 0.05, we reject the null hypothesis. This suggests that members tend to retrieve information from other members with high expertise.

H3: High expertise members tend to have more outgoing ties.

The nodeocov.expertise parameter, which represents the tendency for high expertise members to form more outgoing ties, has an estimate of 0.8309 with a standard error of 0.8934. The z-value is 0.930, and the p-value is 0.3524. Since the p-value is greater than 0.05, we fail to reject the null hypothesis. Therefore, we do not have sufficient evidence to conclude that high expertise members tend to have more outgoing ties.

In summary, the results support Hypothesis 2 but do not provide sufficient evidence to support Hypotheses 1 and 3.