Unit 4 Analysis: Tie Prediction

This independent case study analyzed the Social Network Analysis and Education Year 3 Collaboration Data set. The data set includes 43 school leaders within one school district.

Data source. This independent case study analyzed the Social Network Analysis and Education Year 3 Collaboration Data set. The data set includes 43 school leaders within one school district. The data sources include 1) the file with all school leaders’ demographic data, such as position levels; 2) the file with all school collaboration frequencies.
Guided questions. In this independent analysis, I am interested in exploring two questions related to how the network characters predict tie information:
1. Which of the school leader’s position levels (at the district-level vs at the school-level) is more likely to influence school leaders forge collaborations?
2. Which of the school leader’s position levels (at the district-level vs at the school-level) is more likely to send/receive collaboration-related information?

Data analysis. This section contains five sub-section. Each sub-section provides a description and the matched code blocks to present the details of each step of the analysis.

Load libraries: these packages are needed for this analysis.

library(statnet)

## Loading required package: tergm

## Loading required package: ergm

## Loading required package: network

## 
## 'network' 1.17.1 (2021-06-12), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## 'ergm' 4.1.2 (2021-07-26), part of the Statnet Project
## * 'news(package="ergm")' for changes since last version
## * 'citation("ergm")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 'ergm' 4 is a major update that introduces some backwards-incompatible
## changes. Please type 'news(package="ergm")' for a list of major
## changes.

## Loading required package: networkDynamic

## 
## 'networkDynamic' 0.11.0 (2021-06-12), part of the Statnet Project
## * 'news(package="networkDynamic")' for changes since last version
## * 'citation("networkDynamic")' for citation information
## * 'https://statnet.org' for help, support, and other information

## Registered S3 method overwritten by 'tergm':
##   method                   from
##   simulate_formula.network ergm

## 
## 'tergm' 4.0.2 (2021-07-28), part of the Statnet Project
## * 'news(package="tergm")' for changes since last version
## * 'citation("tergm")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## Attaching package: 'tergm'

## The following object is masked from 'package:ergm':
## 
##     snctrl

## Loading required package: ergm.count

## 
## 'ergm.count' 4.0.2 (2021-06-18), part of the Statnet Project
## * 'news(package="ergm.count")' for changes since last version
## * 'citation("ergm.count")' for citation information
## * 'https://statnet.org' for help, support, and other information

## Loading required package: sna

## Loading required package: statnet.common

## 
## Attaching package: 'statnet.common'

## The following object is masked from 'package:ergm':
## 
##     snctrl

## The following objects are masked from 'package:base':
## 
##     attr, order

## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.

## Loading required package: tsna

## 
## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
## * 'news(package="statnet")' for changes since last version
## * 'citation("statnet")' for citation information
## * 'https://statnet.org' for help, support, and other information

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readxl)
library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union

## The following objects are masked from 'package:purrr':
## 
##     compose, simplify

## The following object is masked from 'package:tidyr':
## 
##     crossing

## The following object is masked from 'package:tibble':
## 
##     as_data_frame

## The following objects are masked from 'package:sna':
## 
##     betweenness, bonpow, closeness, components, degree, dyad.census,
##     evcent, hierarchy, is.connected, neighborhood, triad.census

## The following objects are masked from 'package:network':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
##     get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
##     is.directed, list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

library(tidygraph)

## 
## Attaching package: 'tidygraph'

## The following object is masked from 'package:igraph':
## 
##     groups

## The following object is masked from 'package:stats':
## 
##     filter

library(ggraph)
library(skimr)
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(btergm)

## Registered S3 methods overwritten by 'btergm':
##   method    from
##   plot.gof  ergm
##   print.gof ergm

## Package:  btergm
## Version:  1.10.3
## Date:     2021-06-24
## Authors:  Philip Leifeld (University of Essex)
##           Skyler J. Cranmer (The Ohio State University)
##           Bruce A. Desmarais (Pennsylvania State University)

## 
## Attaching package: 'btergm'

## The following object is masked from 'package:ergm':
## 
##     gof

Import dataset: there are two data files used in this analysis, the edge and the node list for Year 3. The node list contains school leaders’ demographic information while the edge list contains the collaboration frequencies among these school leaders.

#nodes
Y3_leader_nodes <- read_excel("data/School Leaders Data Chapter 9_e.xlsx", 
                           col_types = c("text", "numeric", "numeric", "numeric", "numeric")) |>
  clean_names()
#edges
Y3_leader_matrix <- read_excel("data/year_3_collaboration.xlsx", 
                            col_names = FALSE)

## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...

#convert to matrix
Y3_leader_matrix <- Y3_leader_matrix |>
  as.matrix()
#dichotomize matrix
Y3_leader_matrix[Y3_leader_matrix <= 2] <- 0
Y3_leader_matrix[Y3_leader_matrix >= 3] <- 1
#add rows and column names
rownames(Y3_leader_matrix) <- Y3_leader_nodes$id
colnames(Y3_leader_matrix) <- Y3_leader_nodes$id
#create an edge list
Y3_adjacency_matrix <- graph.adjacency(Y3_leader_matrix,
                                    diag = FALSE)
Y3_leader_edges <- get.data.frame(Y3_adjacency_matrix) |>
  mutate(from = as.character(from)) |>
  mutate(to = as.character(to))

Network overview: as the code blocks shown below, this network contains 43 nodes and 362 edges. This network consists of 5 distinct components. In particular, this network has a strong component with 38 members while most of the other components are isolated nodes.

#Year 3 graph object
Y3_leader_graph <- tbl_graph(edges = Y3_leader_edges,
                          nodes = Y3_leader_nodes,
                          directed = TRUE)

#component
components(Y3_leader_graph, mode = c("strong"))

## $membership
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 1 1 2 5 1 1 1 1 1 1 1
## [39] 4 3 1 1 1
## 
## $csize
## [1] 38  1  1  1  2
## 
## $no
## [1] 5

#only directed ties
Y3_network <- Y3_leader_graph  %>% 
  activate(nodes)  %>% 
  mutate(strong_component = group_components(type = "strong"))
as_tibble(Y3_network)

## # A tibble: 43 × 6
##    id    efficacy trust district_site  male strong_component
##    <chr>    <dbl> <dbl>         <dbl> <dbl>            <int>
##  1 1         6.06  4                1     0                1
##  2 2         6.56  5.63             1     0                1
##  3 3         7.39  4.63             1     0                1
##  4 4         4.89  4                1     0                1
##  5 5         6.06  5.75             0     1                1
##  6 6         7.39  4.38             0     0                1
##  7 7         5.56  3.63             0     1                1
##  8 8         7.5   5.63             1     1                1
##  9 9         7.67  5.25             0     0                1
## 10 10        6.64  4.78             0     0                1
## # … with 33 more rows

Y3_network  %>% 
  as_tibble()  %>% 
  group_by(strong_component)  %>% 
  summarise(count = n())  %>% 
  arrange(desc(count))

## # A tibble: 5 × 2
##   strong_component count
##              <int> <int>
## 1                1    38
## 2                2     2
## 3                3     1
## 4                4     1
## 5                5     1

Network measure_degree: I looked at the “in-” and “out-” degree of the Year 3 collaboration network. The Mean (M) for the in-degree and the out-degree of this network is the same: 8.419.

#Explore
#calculate node degree
#in- and out- degree for Y3 ()
Y3_leader_measures <- Y3_leader_graph |>
  activate(nodes) |>
  mutate(in_degree = centrality_degree(mode = "in")) |>
  mutate(out_degree = centrality_degree(mode = "out"))

Y3_leader_measures

## # A tbl_graph: 43 nodes and 362 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 43 × 7 (active)
##   id    efficacy trust district_site  male in_degree out_degree
##   <chr>    <dbl> <dbl>         <dbl> <dbl>     <dbl>      <dbl>
## 1 1         6.06  4                1     0        12          6
## 2 2         6.56  5.63             1     0         2          1
## 3 3         7.39  4.63             1     0        13         10
## 4 4         4.89  4                1     0         5          7
## 5 5         6.06  5.75             0     1         9          1
## 6 6         7.39  4.38             0     0         6          1
## # … with 37 more rows
## #
## # Edge Data: 362 × 2
##    from    to
##   <int> <int>
## 1     1     3
## 2     1     4
## 3     1     8
## # … with 359 more rows

Y3_node_measures <- Y3_leader_measures |> 
  activate(nodes) |>
  as_tibble()

Y3_node_measures

## # A tibble: 43 × 7
##    id    efficacy trust district_site  male in_degree out_degree
##    <chr>    <dbl> <dbl>         <dbl> <dbl>     <dbl>      <dbl>
##  1 1         6.06  4                1     0        12          6
##  2 2         6.56  5.63             1     0         2          1
##  3 3         7.39  4.63             1     0        13         10
##  4 4         4.89  4                1     0         5          7
##  5 5         6.06  5.75             0     1         9          1
##  6 6         7.39  4.38             0     0         6          1
##  7 7         5.56  3.63             0     1         6          2
##  8 8         7.5   5.63             1     1        20         24
##  9 9         7.67  5.25             0     0        10          5
## 10 10        6.64  4.78             0     0         9         17
## # … with 33 more rows

summary(Y3_node_measures)

##       id               efficacy         trust       district_site   
##  Length:43          Min.   :4.610   Min.   :3.630   Min.   :0.0000  
##  Class :character   1st Qu.:5.670   1st Qu.:4.130   1st Qu.:0.0000  
##  Mode  :character   Median :6.780   Median :4.780   Median :0.0000  
##                     Mean   :6.649   Mean   :4.783   Mean   :0.4186  
##                     3rd Qu.:7.470   3rd Qu.:5.440   3rd Qu.:1.0000  
##                     Max.   :8.500   Max.   :5.880   Max.   :1.0000  
##       male          in_degree        out_degree    
##  Min.   :0.0000   Min.   : 2.000   Min.   : 0.000  
##  1st Qu.:0.0000   1st Qu.: 6.000   1st Qu.: 1.500  
##  Median :0.0000   Median : 8.000   Median : 6.000  
##  Mean   :0.4419   Mean   : 8.419   Mean   : 8.419  
##  3rd Qu.:1.0000   3rd Qu.: 9.500   3rd Qu.:11.000  
##  Max.   :1.0000   Max.   :20.000   Max.   :42.000

Tie Prediction: Based on the descriptive statistics (shown in the bar chart), the district-level school leaders have form higher mean and standard deviation in the in- and the out- degrees compare to the school-level leaders. A series of Exponential Random Graph Models (ERGMs) were used to further explore how school leaders at the school/district levels formed collaboration.

Y3_node_measures |>
  group_by(district_site) |>
  summarise(n = n(),
            mean = mean(in_degree), 
            sd = sd(in_degree)
  ) #SM = 7.52, SSD = 1.61; DM = 9.67, DSD = 4.28

## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  7.52  1.66
## 2             1    18  9.67  4.23

Y3_node_measures |>
  group_by(district_site) |>
  summarise(n = n(),
            mean = mean(out_degree), 
            sd = sd(out_degree)
  ) #SM = 5.4, SSD = 5.9; DM = 12.61, DSD = 11.71

## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25   5.4  5.99
## 2             1    18  12.6 11.7

Y3_leader_network <- as.network(Y3_leader_edges,
                             vertices = Y3_leader_nodes)

summary(Y3_leader_network ~ edges + mutual)

##  edges mutual 
##    362     91

#model
set.seed(589)
Y3_ergm_4 <- as.formula(Y3_leader_network ~ # The network
                     mutual('district_site', diff = TRUE) + # How many ties are reciprocated?
                     edges)
fit1 <- ergm(Y3_ergm_4)

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## Optimizing with step length 0.9045.

## The log-likelihood improved by 2.3757.

## Estimating equations are not within tolerance region.

## Iteration 2 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.4923.

## Estimating equations are not within tolerance region.

## Iteration 3 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.0064.

## Convergence test p-value: 0.0406. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0132.
## Convergence test p-value: 0.1368. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0201.
## Convergence test p-value: 0.0862. Not converged with 99% confidence; increasing sample size.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0093.
## Convergence test p-value: 0.0194. Not converged with 99% confidence; increasing sample size.
## Iteration 7 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0219.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit1)

## Call:
## ergm(formula = Y3_ergm_4)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                             Estimate Std. Error MCMC % z value Pr(>|z|)    
## mutual.same.district_site.0  1.44022    0.25774      0   5.588   <1e-04 ***
## mutual.same.district_site.1  3.17053    0.21931      0  14.457   <1e-04 ***
## edges                       -1.82209    0.07417      0 -24.565   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 2504  on 1806  degrees of freedom
##  Residual Deviance: 1633  on 1803  degrees of freedom
##  
## AIC: 1639  BIC: 1655  (Smaller is better. MC Std. Err. = 0.9127)

#leaders at the district level have a higher propensity to engage in mutual behavior. 

Y3_ergm_5 <- as.formula(Y3_leader_network ~ 
                   nodefactor('district_site') + # Difference in connections conditional on site
                   mutual + 
                   edges) # edges term

fit7 <- ergm(Y3_ergm_5)

## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0913.
## Convergence test p-value: 0.0947. Not converged with 99% confidence; increasing sample size.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0366.
## Convergence test p-value: 0.1916. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0321.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit7)

## Call:
## ergm(formula = Y3_ergm_5)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                            Estimate Std. Error MCMC % z value Pr(>|z|)    
## nodefactor.district_site.1  0.53065    0.07747      0   6.849   <1e-04 ***
## mutual                      1.82546    0.19398      0   9.410   <1e-04 ***
## edges                      -2.40932    0.11205      0 -21.501   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 2504  on 1806  degrees of freedom
##  Residual Deviance: 1648  on 1803  degrees of freedom
##  
## AIC: 1654  BIC: 1670  (Smaller is better. MC Std. Err. = 0.5585)

Y3_ergm_6 <- as.formula(Y3_leader_network ~ 
                    nodeofactor('district_site') + # Difference in outgoing connections conditional at the school/district level. 
                    nodeifactor('district_site') + # Difference in incoming connections conditional at the school/district level. 
                    mutual + 
                    edges)

fit7b <- ergm(Y3_ergm_6)

## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2094.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0202.
## Convergence test p-value: 0.0253. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0611.
## Convergence test p-value: 0.3803. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0135.
## Convergence test p-value: 0.0879. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0216.
## Convergence test p-value: 0.0202. Not converged with 99% confidence; increasing sample size.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0149.
## Convergence test p-value: 0.0005. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(fit7b)

## Call:
## ergm(formula = Y3_ergm_6)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                             Estimate Std. Error MCMC % z value Pr(>|z|)    
## nodeofactor.district_site.1  1.11098    0.12417      0   8.947   <1e-04 ***
## nodeifactor.district_site.1 -0.06348    0.14303      0  -0.444    0.657    
## mutual                       2.00245    0.19569      0  10.233   <1e-04 ***
## edges                       -2.48778    0.10781      0 -23.075   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 2504  on 1806  degrees of freedom
##  Residual Deviance: 1618  on 1802  degrees of freedom
##  
## AIC: 1626  BIC: 1648  (Smaller is better. MC Std. Err. = 0.8357)

#we can learn from the results that the district_site is more affecting outgoing collaboration. 

gof.model17 <- btergm::gof(fit7b, nsim = 50)

## 
## Starting GOF assessment on a single computing core....
## 
## No 'target' network(s) provided. Using networks on the left-hand side of the model formula as observed networks.
## 
## Simulating 50 networks from the following formula:
##  Y3_leader_network ~ nodeofactor("district_site") + nodeifactor("district_site") + mutual + edges 
## 
## One network from which simulations are drawn was provided.
## 
## Processing statistic: Dyad-wise shared partners
## Processing statistic: Edge-wise shared partners
## Processing statistic: Degree
## Processing statistic: Indegree
## Processing statistic: Geodesic distances
## Processing statistic: Tie prediction
## Processing statistic: Modularity (walktrap)

gof.model17$Degree$stats #goodness of fit

##    obs sim: mean median min max     Pr(>z)
## 0    0      0.00    0.0   0   0 1.00000000
## 1    0      0.00    0.0   0   0 1.00000000
## 2    1      0.02    0.0   0   1 0.49978021
## 3    0      0.06    0.0   0   1 0.96704361
## 4    1      0.24    0.0   0   1 0.60073634
## 5    1      0.78    1.0   0   3 0.87958627
## 6    5      1.66    1.5   0   6 0.02145106
## 7    3      2.62    2.5   0   6 0.79357514
## 8    5      3.24    3.0   0   7 0.22553164
## 9    4      3.36    3.0   0   8 0.65942370
## 10   2      4.02    4.0   0   8 0.16422916
## 11   5      3.78    4.0   0   7 0.40085149
## 12   3      3.76    4.0   0   9 0.60073634
## 13   0      3.66    3.5   0   8 0.01172534
## 14   2      2.64    3.0   0   7 0.65942370
## 15   1      2.94    3.0   0   6 0.18158226
## 16   1      2.78    3.0   0   7 0.22030337
## 17   1      2.22    2.0   0   7 0.40085149
## 18   1      1.76    2.0   0   4 0.60073634
## 19   1      1.30    1.0   0   4 0.83633606
## 20   0      1.00    1.0   0   4 0.49107013
## 21   1      0.52    0.0   0   2 0.74099822
## 22   0      0.32    0.0   0   2 0.82559509
## 23   0      0.12    0.0   0   2 0.93414341
## 24   0      0.16    0.0   0   1 0.91226881
## 25   1      0.04    0.0   0   1 0.50857162
## 26   0      0.00    0.0   0   0 1.00000000
## 27   0      0.00    0.0   0   0 1.00000000
## 28   0      0.00    0.0   0   0 1.00000000
## 29   1      0.00    0.0   0   0 0.49107013
## 30   1      0.00    0.0   0   0 0.49107013
## 31   0      0.00    0.0   0   0 1.00000000
## 32   0      0.00    0.0   0   0 1.00000000
## 33   1      0.00    0.0   0   0 0.49107013
## 34   0      0.00    0.0   0   0 1.00000000

Data visualization. The first graph below is to provide an overview of the Year 3 collaboration. The second graph shows the total degree of collaboration between the school leaders at the school and at the district levels – leaders at either the school or the district levels have high degree. However, as shown in the final graph, school leaders at the district level have darker out-degree values which mean that they were more likely to form collaborations.

plot(Y3_leader_network, 
     vertex.col = "tomato", 
     vertex.cex = 1) #overall graph

set.seed(100)
Y3_leader_measures %>%
  activate(nodes) %>% mutate (degree = in_degree + out_degree) %>% 
  ggraph(layout = "graphopt") + 
  geom_edge_link(width = 1, colour = "lightgray") +
  geom_node_point(aes(colour = degree)) +
  geom_node_text(aes(label = as.factor(district_site)), repel = TRUE)+
  scale_color_gradient(low = "yellow", high = "red") #all degree

set.seed(100)
Y3_leader_measures %>%
  activate(nodes) %>%
  ggraph(layout = "graphopt") + 
  geom_edge_link(width = 1, colour = "lightgray") +
  geom_node_point(aes(colour = out_degree)) +
  geom_node_text(aes(label = as.factor(district_site)), repel = TRUE)+
  scale_color_gradient(low = "yellow", high = "red") #outdegree

Conclusion. In summary, the Year 3 school leaders are more connected through collaboration with a strong network component of 38 members. As shown in Part 3 and Part 4 in this document, members at the school-level and at the district-level are significantly forming mutual collaborations with others although leaders at the district-level have a higher propensity to engage in mutual behavior. Furthermore, leaders at the district levels are more likely to send collaboration requests (as shown in darker colors in the out-degree values in the graph in Part 4) which mean that they were more likely to form collaborations.
References.

Daly, Alan J, and Kara S Finnigan. 2011. “The Ebb and Flow ofSocial Network Ties Between District Leaders Under High-StakesAccountability.” American Educational Research Journal48 (1): 39–79.

Unit 4 Analysis: Tie Prediction

ECI 589 Social Network Analysis and Education

Hui Yang

April 1, 2022