Assignment 4 - Lab 3

Author

Delaney Burns

Assignment 4: Social Network Analysis in Education (ERGM)

1. Method

Participants & Context
This study analyzes the social dynamics within a third-grade classroom. Students were asked to identify their three closest friends in class, resulting in a directed social network based on peer nominations.

Data Collection

Friendship nominations were collected via a class survey.
Student attributes (gender and academic performance) were included to explore patterns in social connectivity.

Data Preparation

The data was anonymized using unique identifiers (Student_1, Student_2, etc.).
The network was constructed from an edgelist where a directed edge indicates one student nominated another as a friend.
Node attributes include: academic performance and gender.

Analysis
An Exponential Random Graph Model (ERGM) was used to test three hypotheses:

H1: Members tend to reciprocate information retrieval ties (mutual friendship nominations).

H2: Members tend to nominate peers with high academic performance (in-degree and grade).

H3: Members with higher academic performance nominate more peers (out-degree and grade).

The ERGM model included the following network terms:

edges – total number of ties

mutual – reciprocal ties

nodeicov("grade") – in-degree correlation with academic performance

nodeocov("grade") – out-degree correlation with academic performance

The model was estimated in R using the ergm package from the statnet suite.

2. Results

1. Prepare the Environment

#Install necessary packages
install.packages("statnet") #Load required packages for ERGM modeling

Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
(as 'lib' is unspecified)

library(statnet)

Loading required package: tergm

Loading required package: ergm

Loading required package: network


'network' 1.19.0 (2024-12-08), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information


'ergm' 4.8.1 (2025-01-20), part of the Statnet Project
* 'news(package="ergm")' for changes since last version
* 'citation("ergm")' for citation information
* 'https://statnet.org' for help, support, and other information

'ergm' 4 is a major update that introduces some backwards-incompatible
changes. Please type 'news(package="ergm")' for a list of major
changes.

Loading required package: networkDynamic


'networkDynamic' 0.11.5 (2024-11-21), part of the Statnet Project
* 'news(package="networkDynamic")' for changes since last version
* 'citation("networkDynamic")' for citation information
* 'https://statnet.org' for help, support, and other information

Registered S3 method overwritten by 'tergm':
  method                   from
  simulate_formula.network ergm


'tergm' 4.2.1 (2024-10-08), part of the Statnet Project
* 'news(package="tergm")' for changes since last version
* 'citation("tergm")' for citation information
* 'https://statnet.org' for help, support, and other information


Attaching package: 'tergm'

The following object is masked from 'package:ergm':

    snctrl

Loading required package: ergm.count


'ergm.count' 4.1.2 (2024-06-15), part of the Statnet Project
* 'news(package="ergm.count")' for changes since last version
* 'citation("ergm.count")' for citation information
* 'https://statnet.org' for help, support, and other information

Loading required package: sna

Loading required package: statnet.common


Attaching package: 'statnet.common'

The following object is masked from 'package:ergm':

    snctrl

The following objects are masked from 'package:base':

    attr, order

sna: Tools for Social Network Analysis
Version 2.8 created on 2024-09-07.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

Loading required package: tsna


'statnet' 2019.6 (2019-06-13), part of the Statnet Project
* 'news(package="statnet")' for changes since last version
* 'citation("statnet")' for citation information
* 'https://statnet.org' for help, support, and other information

#Load edgelist
edgelist <- read.csv("anonymized_edgelist.csv")
class_net <- network(edgelist, directed = TRUE)

#Load attributes
attributes <- read.csv("anonymized_attributes.csv")
set.vertex.attribute(class_net, "grade", attributes$Grade)
set.vertex.attribute(class_net, "gender", attributes$Gender)

#Convert categorical grade to numeric (e.g., High = 3, Average = 2, Low = 1)
attributes$grade_numeric <- ifelse(attributes$Grade == "High", 3,
                             ifelse(attributes$Grade == "Average", 2, 1))

#Assign it to the network
set.vertex.attribute(class_net, "grade_numeric", attributes$grade_numeric)

#Plot network
plot(class_net)

2. Build the ERGM

#Build ERGM model with edge, mutual, and grade-related terms
set.seed(123)
model <- ergm(class_net ~
  edges + # base tendency to form a tie
  mutual + #H1:Reciprocity
  nodeicov("grade_numeric") +  #H2:In-degree vs. Grade
  nodeocov("grade_numeric")    #H3:Out-degree vs. Grade
)

Starting maximum pseudolikelihood estimation (MPLE):

Obtaining the responsible dyads.

Evaluating the predictor and response matrix.

Maximizing the pseudolikelihood.

Finished MPLE.

Starting Monte Carlo maximum likelihood estimation (MCMLE):

Iteration 1 of at most 60:

Warning: 'glpk' selected as the solver, but package 'Rglpk' is not available;
falling back to 'lpSolveAPI'. This should be fine unless the sample size and/or
the number of parameters is very big.

Optimizing with step length 1.0000.

The log-likelihood improved by 0.0103.

Convergence test p-value: 0.0002. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.

This model was fit using MCMC.  To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.

# Summarize the model results
summary(model)

Call:
ergm(formula = class_net ~ edges + mutual + nodeicov("grade_numeric") + 
    nodeocov("grade_numeric"))

Monte Carlo Maximum Likelihood Results:

                       Estimate Std. Error MCMC % z value Pr(>|z|)  
edges                   -1.4688     0.5805      0  -2.530   0.0114 *
mutual                   1.2300     0.5707      0   2.155   0.0311 *
nodeicov.grade_numeric  -0.1962     0.2024      0  -0.969   0.3323  
nodeocov.grade_numeric  -0.1827     0.1970      0  -0.927   0.3538  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 474.1  on 342  degrees of freedom
 Residual Deviance: 247.9  on 338  degrees of freedom
 
AIC: 255.9  BIC: 271.2  (Smaller is better. MC Std. Err. = 0.1452)

3. Model Diagnostics and Goodness of Fit

MCMC Convergence

pdf("modeldiagnostics.pdf") 
mcmc.diagnostics(model)

Sample statistics summary:

Iterations = 14336:262144
Thinning interval = 1024 
Number of chains = 1 
Sample size per chain = 243 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                          Mean     SD Naive SE Time-series SE
edges                  0.47737  5.919   0.3797         0.3868
mutual                 0.07407  2.378   0.1526         0.1357
nodeicov.grade_numeric 0.75309 12.210   0.7833         0.8142
nodeocov.grade_numeric 0.34156 12.305   0.7894         0.8009

2. Quantiles for each variable:

                       2.5% 25% 50% 75% 97.5%
edges                   -10  -4   1 4.5 12.00
mutual                   -4  -2   0 2.0  5.00
nodeicov.grade_numeric  -21  -8   1 9.0 23.95
nodeocov.grade_numeric  -20  -9   0 7.5 27.00


Are sample statistics significantly different from observed?
               edges     mutual nodeicov.grade_numeric nodeocov.grade_numeric
diff.      0.4773663 0.07407407              0.7530864              0.3415638
test stat. 1.2342711 0.54589807              0.9249859              0.4264785
P-val.     0.2171019 0.58513600              0.3549732              0.6697592
             (Omni)
diff.            NA
test stat. 4.940629
P-val.     0.303118

Sample statistics cross-correlations:
                           edges    mutual nodeicov.grade_numeric
edges                  1.0000000 0.6659007              0.9121233
mutual                 0.6659007 1.0000000              0.5728602
nodeicov.grade_numeric 0.9121233 0.5728602              1.0000000
nodeocov.grade_numeric 0.9093549 0.5841544              0.8417626
                       nodeocov.grade_numeric
edges                               0.9093549
mutual                              0.5841544
nodeicov.grade_numeric              0.8417626
nodeocov.grade_numeric              1.0000000

Sample statistics auto-correlation:
Chain 1 
               edges       mutual nodeicov.grade_numeric nodeocov.grade_numeric
Lag 0     1.00000000  1.000000000            1.000000000            1.000000000
Lag 1024 -0.01783782  0.019046703           -0.015624230           -0.016015131
Lag 2048 -0.16740656 -0.134391305           -0.179117296           -0.161014633
Lag 3072  0.01670086  0.006834323           -0.056795860            0.007797919
Lag 4096 -0.03595024  0.071180587            0.004821366           -0.023540155
Lag 5120  0.01843519 -0.091647474            0.062516275           -0.023106689

Sample statistics burn-in diagnostic (Geweke):
Chain 1 

Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5 

                 edges                 mutual nodeicov.grade_numeric 
             0.1809539             -0.6275267             -0.1612883 
nodeocov.grade_numeric 
             0.1191309 

Individual P-values (lower = worse):
                 edges                 mutual nodeicov.grade_numeric 
             0.8564038              0.5303140              0.8718663 
nodeocov.grade_numeric 
             0.9051717 
Joint P-value (lower = worse):  0.863617 

Note: MCMC diagnostics shown here are from the last round of
  simulation, prior to computation of final parameter estimates.
  Because the final estimates are refinements of those used for this
  simulation run, these diagnostics may understate model performance.
  To directly assess the performance of the final model on in-model
  statistics, please use the GOF command: gof(ergmFitObject,
  GOF=~model).

dev.off()

png 
  2

To assess the convergence of the ERGM model, MCMC diagnostics were examined using trace and density plots for each parameter: edges, mutual, nodeicov.grade_numeric, and nodeocov.grade_numeric (see the graphs above). The trace plots demonstrate consistent fluctuation around a stable average without upward or downward trends. This suggests the model reached a stationary distribution. Additionally, the density plots show approximately normal, symmetric distributions for each parameter, reinforcing the visual evidence of good mixing and convergence. The MCMC estimation process converged to a stable state, so the estimates are reliable for interpretation.

GOF

#Run goodness-of-fit diagnostics for the final model
model.gof <- gof(model)
print(model.gof) #print results


Goodness-of-fit for in-degree 

         obs min mean max MC p-value
idegree0   1   0 2.09   7       0.72
idegree1   4   0 4.53  10       1.00
idegree2   7   0 5.58  11       0.64
idegree3   5   0 3.91   9       0.66
idegree4   1   0 1.70   6       1.00
idegree5   1   0 0.82   4       1.00
idegree6   0   0 0.33   3       1.00
idegree7   0   0 0.03   1       1.00
idegree8   0   0 0.01   1       1.00

Goodness-of-fit for out-degree 

         obs min mean max MC p-value
odegree0   5   0 1.98   7       0.16
odegree1   0   1 4.91  10       0.00
odegree2   0   1 5.26   9       0.00
odegree3  14   0 3.78  11       0.00
odegree4   0   0 1.95   6       0.26
odegree5   0   0 0.78   4       0.92
odegree6   0   0 0.27   2       1.00
odegree7   0   0 0.06   2       1.00
odegree8   0   0 0.01   1       1.00

Goodness-of-fit for edgewise shared partner 

         obs min  mean max MC p-value
esp.OTP0  26  22 31.96  42       0.20
esp.OTP1  14   1  7.61  20       0.24
esp.OTP2   2   0  0.98   8       0.48
esp.OTP3   0   0  0.04   1       1.00

Goodness-of-fit for minimum geodesic distance 

    obs min  mean max MC p-value
1    42  25 40.59  55       0.94
2    54  26 65.29 108       0.62
3    32  20 64.52 105       0.14
4    17   9 41.20  71       0.14
5     4   0 19.24  43       0.10
6     0   0  7.85  26       0.20
7     0   0  2.78  16       0.82
8     0   0  0.79   9       1.00
9     0   0  0.17   4       1.00
10    0   0  0.04   4       1.00
Inf 193   0 99.53 252       0.16

Goodness-of-fit for model statistics 

                       obs min  mean max MC p-value
edges                   42  25 40.59  55       0.94
mutual                   6   0  5.71  12       0.94
nodeicov.grade_numeric  78  46 75.24 103       0.90
nodeocov.grade_numeric  78  46 75.29 107       0.88

par(mfrow=c(1,2)) #makes it 1 row with two graphs in each
plot(model.gof) #plots the graph

Model Statistics

The observed statistics (blue diamonds) mostly fall within or close to the interquartile range of the simulated values, suggesting that my model fits its own structural parameters reasonably well.

Out Degree Distribution

There is a notable misfit here. The model significantly underestimates the number of students nominating exactly 3 friends. This is expected due to the survey design constraint that required students to nominate exactly 3 friends. ERGM assumes unconstrained tie formation, so it can’t fully capture this fixed out-degree pattern.

In-Degree Distribution

The graph shows that the model fits the observed in-degree distribution well, suggesting that patterns of peer nominations (popularity) are accurately captured.

Edgewise Shared Partners

The graph indicates a good fit for edgewise shared partners, demonstrating the model’s ability to reproduce local clustering behavior.

Minimum Geodesic Distance

The model captures short paths (distance 1–2) fairly well but underestimates longer path lengths and overestimates the number of unreachable dyads (NR). This indicates that the model does a good job of simulating local connectivity, but slightly underestimates broader network reach.

3. Discussion

This study examined the structure of a third-grade classroom friendship network using Exponential Random Graph Modeling (ERGM). The goal was to explore whether students tend to reciprocate friendship nominations, and whether academic performance (represented by grade levels) influences who students choose as friends or how many friends they select.

The results partially supported the proposed hypotheses. Consistent with H1, the mutual term in the ERGM was significant and positive, indicating that students were more likely to nominate peers who also nominated them. This suggests a tendency toward reciprocity, which reflects a socially balanced classroom environment where friendships are mutual rather than one-sided.

In contrast, H2 and H3 were not supported. Neither the in-degree nor the out-degree covariates for academic performance were statistically significant. This indicates that a student’s grade level did not significantly predict how many friendship nominations they received or gave. These findings suggest that academic performance does not play a major role in shaping the social structure of this classroom. This aligns with my assumptions and observations as a third grade teacher. Although, I was curious if students with higher academic success would have more ties, I didn’t think I necessarily had seen this in this classroom.

Model diagnostics supported the overall validity of the ERGM. MCMC convergence diagnostics were strong, and the simulated networks closely resembled the observed one in terms of edge count, reciprocity, and shared partner statistics. The goodness-of-fit tests showed the model fit the in-degree distribution and clustering patterns well, suggesting it accurately captured who was nominated and broader social cohesion. However, the model struggled to replicate the out-degree distribution, with significant discrepancies for students nominating one, two, or three peers.

This discrepancy is likely explained by the data collection constraint. Each student was required to select exactly three friends. This standardized the out-degree across the network. This messed with the assumptions of ERGMs that expect unconstrained choice. As such, the model could not fully reproduce the rigidity in out-degree values, highlighting a methodological limitation. Future research should consider allowing students to nominate a variable number of peers, which would enable more flexible and realistic modeling of outgoing ties.

In summary, this study highlights the utility of ERGMs in uncovering patterns of reciprocity and influence in educational social networks. While academic performance did not appear to structure the network, the significant reciprocity effect suggests that fostering mutual friendships may be a key lever for promoting positive peer relationships in elementary classrooms.