1. Prepare

For the Final Project, I am revisiting the work of Alan Daly and colleagues that is described in Chapter 9: Network Data and Statistical Models from Social Network Analysis and Education (Carolan 2014). In this case study, I expand on our previous inquiry into how school leaders select peers for collaboration or confidential exchanges.

More specifically, this case study will cover the following topics pertaining to each data-intensive workflow process:

  1. Prepare: Prior to analysis, we’ll revisit where our data is coming from and share the research question(s) guiding this analysis.

  2. Wrangle: In Section 2, we will use this space to transform our data so that we have distinct datasets based on gender and replicating the research process we have completed in the Unit 4 Case Study using our new dataset for a comparative analysis.

  3. Analyze: We wrap up our analysis in Section 4 by incorporating exponential random graph modules (ERGM) used in Chapter 9 of Carolan (2014) to check how this model compares to the results of our previous ERGM.

  4. Communicate: Finally, we’ll select, polish, and narrate a key finding from our analysis that helps to answer or research question(s).

1a. Purpose

In Unit 4, we examined the the level of influence of gender or other individual attributes in predicting confidential exchanges between school leaders. I continued this inquiry in my independent analysis to isolate male-to-male interactions. However, I have yet to compare and contrast these results to female school leaders. Specifically, this analysis will try to answer the following research question:

Are female school leaders more likely to confide in colleagues of their own gender than their male counterparts?

1b. Data Source

This analysis utilizes the School Leaders data mentioned above. In Chapter 9, Carolan (2014) describes this data by saying:

“These ideas and tools will be demonstrated using a portion of the School Leaders data set, particularly observations in years 1 and 3 of the”collaboration” and “confidential help” networks. Collaboration is a directed valued network measured on five-point scale ranging from 0 to 4, with higher values indicating more frequent collaborations (1–2 times/week). Similarly, the confidential help network is a directed, valued network measured on the same scale. Complete network data are available on all 43 school leaders drawn from two large school districts. In addition to these relational data, the survey instrument in year 1 captured attribute data, including composite scores from a set of items related to efficacy (19 items) and trust (8 items), as well as information about leaders’ gender and whether they worked at the district or school level”.

1c. Target Audience

While this project is something I am generally curious about, it could provide valuable insights into the differences between disclosure between same-sex colleagues. With that being said, I envision this analysis would benefit those within the network themselves (or those in comparable environments) as well as those in higher leadership positions. While this analysis takes place in the context of education, it could be used as a model for measuring other industries as well to predict and analyze the effect of confidential exchanges in the workplace between same-sex colleagues.

1d. Load Libraries

To begin, load the the following packages that will be using for our analysis of network influence:

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
## ✔ tibble  3.1.6     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(readxl)
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(tidygraph)
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:igraph':
## 
##     groups
## The following object is masked from 'package:stats':
## 
##     filter
library(ggraph)
library(skimr)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(statnet)
## Loading required package: tergm
## Loading required package: ergm
## Loading required package: network
## 
## 'network' 1.17.1 (2021-06-12), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information
## 
## Attaching package: 'network'
## The following objects are masked from 'package:igraph':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
##     get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
##     is.directed, list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute
## 
## 'ergm' 4.1.2 (2021-07-26), part of the Statnet Project
## * 'news(package="ergm")' for changes since last version
## * 'citation("ergm")' for citation information
## * 'https://statnet.org' for help, support, and other information
## 'ergm' 4 is a major update that introduces some backwards-incompatible
## changes. Please type 'news(package="ergm")' for a list of major
## changes.
## Loading required package: networkDynamic
## 
## 'networkDynamic' 0.11.1 (2022-04-04), part of the Statnet Project
## * 'news(package="networkDynamic")' for changes since last version
## * 'citation("networkDynamic")' for citation information
## * 'https://statnet.org' for help, support, and other information
## Registered S3 method overwritten by 'tergm':
##   method                   from
##   simulate_formula.network ergm
## 
## 'tergm' 4.0.2 (2021-07-28), part of the Statnet Project
## * 'news(package="tergm")' for changes since last version
## * 'citation("tergm")' for citation information
## * 'https://statnet.org' for help, support, and other information
## 
## Attaching package: 'tergm'
## The following object is masked from 'package:ergm':
## 
##     snctrl
## Loading required package: ergm.count
## 
## 'ergm.count' 4.0.2 (2021-06-18), part of the Statnet Project
## * 'news(package="ergm.count")' for changes since last version
## * 'citation("ergm.count")' for citation information
## * 'https://statnet.org' for help, support, and other information
## Loading required package: sna
## Loading required package: statnet.common
## 
## Attaching package: 'statnet.common'
## The following object is masked from 'package:ergm':
## 
##     snctrl
## The following objects are masked from 'package:base':
## 
##     attr, order
## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
## 
## Attaching package: 'sna'
## The following objects are masked from 'package:igraph':
## 
##     betweenness, bonpow, closeness, components, degree, dyad.census,
##     evcent, hierarchy, is.connected, neighborhood, triad.census
## Loading required package: tsna
## 
## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
## * 'news(package="statnet")' for changes since last version
## * 'citation("statnet")' for citation information
## * 'https://statnet.org' for help, support, and other information

2. WRANGLE

For our data wrangling, we’ll focus on working with network data stored as an adjacency matrix. Our primary goals for this section are as follows:

  1. Data preprocessing. In this section, we will replicate the steps from the Unit 4 case study.

  2. Data transformations. Finally, we’ll mutate our matrix to include only female school leaders that will then be used to convert to an edge-list and store both our edges and node attributes as a network igraph object in preparation for analysis.

  3. New variables or features. Once preprocessing is completed, we will follow the steps to move from a valued matrix to a binary matrix.

2a. Data Preprocessing

We already have access to the School Leaders Data Chapter 9_d and School Leaders Data Chapter 9_e from Unit 4.

Import Data

To get started, we’ll use the read_xl() function to import our data. For coherence and cohesion, we will adopt the same naming conventions used in Unit 4 and will make note of any instances when this does not occur (e.g., new variables and data transformations that did not occur in the Unit 4 Case Study):

leader_matrix <- read_excel("data/School Leaders Data Chapter 9_d.xlsx", 
                           col_names = FALSE)
## New names:
## • `` -> `...1`
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
## • `` -> `...37`
## • `` -> `...38`
## • `` -> `...39`
## • `` -> `...40`
## • `` -> `...41`
## • `` -> `...42`
## • `` -> `...43`
leader_matrix
## # A tibble: 43 × 43
##     ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12 ...13
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     0     0     0     0     0     0     0     0     0     0     0     0     0
##  2     0     0     0     0     0     0     0     0     0     0     0     0     0
##  3     3     1     0     1     1     1     1     2     1     1     1     1     1
##  4     1     0     0     0     0     0     0     0     0     0     0     0     0
##  5     0     0     0     0     0     0     1     0     3     0     0     0     0
##  6     0     0     0     0     0     0     0     0     0     0     0     0     0
##  7     2     1     1     1     1     1     4     1     1     3     1     1     1
##  8     1     1     4     1     1     1     1     0     2     3     1     1     1
##  9     1     0     0     0     4     0     0     1     0     0     0     0     0
## 10     1     1     3     1     1     1     4     4     1     0     3     1     1
## # … with 33 more rows, and 30 more variables: ...14 <dbl>, ...15 <dbl>,
## #   ...16 <dbl>, ...17 <dbl>, ...18 <dbl>, ...19 <dbl>, ...20 <dbl>,
## #   ...21 <dbl>, ...22 <dbl>, ...23 <dbl>, ...24 <dbl>, ...25 <dbl>,
## #   ...26 <dbl>, ...27 <dbl>, ...28 <dbl>, ...29 <dbl>, ...30 <dbl>,
## #   ...31 <dbl>, ...32 <dbl>, ...33 <dbl>, ...34 <dbl>, ...35 <dbl>,
## #   ...36 <dbl>, ...37 <dbl>, ...38 <dbl>, ...39 <dbl>, ...40 <dbl>,
## #   ...41 <dbl>, ...42 <dbl>, ...43 <dbl>
leader_nodes <- read_excel("data/School Leaders Data Chapter 9_e.xlsx", 
                           col_types = c("text", "numeric", "numeric", "numeric", "numeric")) |>
  clean_names()

leader_nodes
## # A tibble: 43 × 5
##    id    efficacy trust district_site  male
##    <chr>    <dbl> <dbl>         <dbl> <dbl>
##  1 1         6.06  4                1     0
##  2 2         6.56  5.63             1     0
##  3 3         7.39  4.63             1     0
##  4 4         4.89  4                1     0
##  5 5         6.06  5.75             0     1
##  6 6         7.39  4.38             0     0
##  7 7         5.56  3.63             0     1
##  8 8         7.5   5.63             1     1
##  9 9         7.67  5.25             0     0
## 10 10        6.64  4.78             0     0
## # … with 33 more rows

2b. Data Transformations

*This step was not included in the Unit 4 Case Study.

Using the mutate() , case_when() and select() functions, we’re creating a column for female values and removing the male values:

female_leader_nodes <- leader_nodes |>
  mutate(female = case_when(male >= 1 ~ '0', TRUE ~ '1')) |>
  mutate(female = as.double(female)) |>
  select(id, efficacy, trust, district_site, female)

female_leader_nodes
## # A tibble: 43 × 5
##    id    efficacy trust district_site female
##    <chr>    <dbl> <dbl>         <dbl>  <dbl>
##  1 1         6.06  4                1      1
##  2 2         6.56  5.63             1      1
##  3 3         7.39  4.63             1      1
##  4 4         4.89  4                1      1
##  5 5         6.06  5.75             0      0
##  6 6         7.39  4.38             0      1
##  7 7         5.56  3.63             0      0
##  8 8         7.5   5.63             1      0
##  9 9         7.67  5.25             0      1
## 10 10        6.64  4.78             0      1
## # … with 33 more rows

Dichotomize Matrix

In this next step, we will convert a valued matrix into a binary matrix:

leader_matrix <- leader_matrix |>
  as.matrix()

class(leader_matrix)
## [1] "matrix" "array"

To dichotomize our matrix, the following code will “assign” 0’s to all values in our matrix that are less than or equal to 2, and 1’s to all values that are greater the or equal to 3:

leader_matrix[leader_matrix <= 2] <- 0

leader_matrix[leader_matrix >= 3] <- 1

Add Row and Column Names

Before we can convert to an edgelist, we will also need to add the names of our nodes to the columns and rows of our matrix:

rownames(leader_matrix) <- female_leader_nodes$id

colnames(leader_matrix) <- female_leader_nodes$id

leader_matrix
##    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## 1  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 2  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 3  1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1
## 4  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1
## 5  0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 6  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 7  0 0 0 0 0 0 1 0 0  1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0
## 8  0 0 1 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  1
## 9  0 0 0 0 1 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 10 0 0 1 0 0 0 1 1 0  0  1  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  0  0
## 11 0 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  0
## 12 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 13 0 0 0 0 0 0 0 1 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 14 0 0 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 15 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
## 16 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0
## 17 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  1  0  0
## 18 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 19 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## 20 0 0 0 0 0 0 1 0 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 21 1 0 0 0 0 0 0 0 0  0  1  1  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0
## 22 0 0 0 0 0 0 0 1 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1
## 23 0 0 0 0 1 0 0 0 0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 24 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
## 25 0 0 1 0 0 0 0 1 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 26 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0
## 27 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
## 28 1 1 1 1 0 1 0 1 1  1  0  0  1  0  0  0  0  1  0  0  1  1  0  0  1  0  1  1
## 29 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 30 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 31 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## 32 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
## 33 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
## 34 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  1
## 35 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
## 36 1 0 1 1 0 0 0 1 0  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  0  1  1  1
## 37 0 0 1 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 38 1 0 1 0 0 0 0 1 0  0  1  0  0  0  0  0  0  1  0  0  0  1  0  0  0  1  1  1
## 39 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 40 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 41 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 42 0 0 0 0 0 0 0 0 1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  1  0  0  1  0
## 43 0 0 0 0 1 0 0 1 0  0  0  0  0  0  1  0  0  1  0  0  1  1  1  0  0  0  0  0
##    29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
## 1   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 2   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 3   0  0  0  0  0  0  0  1  0  1  0  0  0  0  0
## 4   0  0  0  0  0  1  0  1  0  0  0  0  0  0  0
## 5   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 6   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 7   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 8   0  0  0  0  0  0  0  0  1  1  0  0  0  0  0
## 9   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 10  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
## 11  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0
## 12  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 13  0  0  0  0  0  0  1  0  0  0  0  0  0  1  0
## 14  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0
## 15  0  0  0  0  0  0  0  0  0  0  1  0  0  1  1
## 16  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 17  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 18  1  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## 19  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0
## 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 21  0  0  0  0  1  0  1  0  1  1  0  0  0  0  1
## 22  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 23  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 24  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
## 25  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 26  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
## 27  0  0  0  1  0  0  0  0  0  0  0  0  0  1  0
## 28  1  0  0  1  0  1  0  1  1  1  0  0  0  0  0
## 29  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 31  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 32  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 33  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 34  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 35  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 36  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## 37  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
## 38  0  0  0  0  1  1  0  1  1  0  1  0  0  0  1
## 39  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 40  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 41  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
## 42  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1
## 43  0  0  0  1  1  1  0  0  1  0  0  0  0  1  0

Create Network with Attributes

Get Edges

First, we need to convert our matrix to an igraph network object:

adjacency_matrix <- graph.adjacency(leader_matrix,
                                    diag = FALSE)

class(adjacency_matrix)
## [1] "igraph"
adjacency_matrix
## IGRAPH 354ac3d DN-- 43 143 -- 
## + attr: name (v/c)
## + edges from 354ac3d (vertex names):
##  [1] 3 ->1  3 ->27 3 ->28 3 ->36 3 ->38 4 ->16 4 ->28 4 ->34 4 ->36 5 ->9 
## [11] 7 ->10 7 ->20 8 ->3  8 ->10 8 ->25 8 ->27 8 ->28 8 ->37 8 ->38 9 ->5 
## [21] 9 ->14 10->3  10->7  10->8  10->11 10->18 10->20 10->38 11->10 11->21
## [31] 11->23 11->35 13->8  13->35 13->42 14->37 15->21 15->39 15->42 15->43
## [41] 16->22 17->26 18->29 18->34 19->31 20->7  20->10 21->1  21->11 21->12
## [51] 21->15 21->33 21->35 21->37 21->38 21->43 22->8  22->16 22->28 23->5 
## [61] 23->11 24->27 24->42 25->3  25->8  26->17 26->41 27->24 27->32 27->42
## [71] 28->1  28->2  28->3  28->4  28->6  28->8  28->9  28->10 28->13 28->18
## + ... omitted several edges

Now we can use the get.data.frame() function to covert our matrix to a standard edge-list:

leader_edges <- get.data.frame(adjacency_matrix) |>
  mutate(from = as.character(from)) |>
  mutate(to = as.character(to))


leader_edges
##     from to
## 1      3  1
## 2      3 27
## 3      3 28
## 4      3 36
## 5      3 38
## 6      4 16
## 7      4 28
## 8      4 34
## 9      4 36
## 10     5  9
## 11     7 10
## 12     7 20
## 13     8  3
## 14     8 10
## 15     8 25
## 16     8 27
## 17     8 28
## 18     8 37
## 19     8 38
## 20     9  5
## 21     9 14
## 22    10  3
## 23    10  7
## 24    10  8
## 25    10 11
## 26    10 18
## 27    10 20
## 28    10 38
## 29    11 10
## 30    11 21
## 31    11 23
## 32    11 35
## 33    13  8
## 34    13 35
## 35    13 42
## 36    14 37
## 37    15 21
## 38    15 39
## 39    15 42
## 40    15 43
## 41    16 22
## 42    17 26
## 43    18 29
## 44    18 34
## 45    19 31
## 46    20  7
## 47    20 10
## 48    21  1
## 49    21 11
## 50    21 12
## 51    21 15
## 52    21 33
## 53    21 35
## 54    21 37
## 55    21 38
## 56    21 43
## 57    22  8
## 58    22 16
## 59    22 28
## 60    23  5
## 61    23 11
## 62    24 27
## 63    24 42
## 64    25  3
## 65    25  8
## 66    26 17
## 67    26 41
## 68    27 24
## 69    27 32
## 70    27 42
## 71    28  1
## 72    28  2
## 73    28  3
## 74    28  4
## 75    28  6
## 76    28  8
## 77    28  9
## 78    28 10
## 79    28 13
## 80    28 18
## 81    28 21
## 82    28 22
## 83    28 25
## 84    28 27
## 85    28 29
## 86    28 32
## 87    28 34
## 88    28 36
## 89    28 37
## 90    28 38
## 91    31 19
## 92    32 24
## 93    33 21
## 94    34 18
## 95    34 22
## 96    34 28
## 97    34 29
## 98    35 21
## 99    36  1
## 100   36  3
## 101   36  4
## 102   36  8
## 103   36 16
## 104   36 18
## 105   36 26
## 106   36 27
## 107   36 28
## 108   36 34
## 109   37  3
## 110   37 14
## 111   37 38
## 112   38  1
## 113   38  3
## 114   38  8
## 115   38 11
## 116   38 18
## 117   38 22
## 118   38 26
## 119   38 27
## 120   38 28
## 121   38 33
## 122   38 34
## 123   38 36
## 124   38 37
## 125   38 39
## 126   38 43
## 127   42  9
## 128   42 15
## 129   42 24
## 130   42 27
## 131   42 43
## 132   43  5
## 133   43  8
## 134   43 15
## 135   43 18
## 136   43 21
## 137   43 22
## 138   43 23
## 139   43 32
## 140   43 33
## 141   43 34
## 142   43 37
## 143   43 42

Now, we will convert our data frames into a network graph object:

leader_graph <- tbl_graph(edges = leader_edges,
                          nodes = female_leader_nodes,
                          directed = TRUE)


leader_graph
## # A tbl_graph: 43 nodes and 143 edges
## #
## # A directed simple graph with 4 components
## #
## # Node Data: 43 × 5 (active)
##   id    efficacy trust district_site female
##   <chr>    <dbl> <dbl>         <dbl>  <dbl>
## 1 1         6.06  4                1      1
## 2 2         6.56  5.63             1      1
## 3 3         7.39  4.63             1      1
## 4 4         4.89  4                1      1
## 5 5         6.06  5.75             0      0
## 6 6         7.39  4.38             0      1
## # … with 37 more rows
## #
## # Edge Data: 143 × 2
##    from    to
##   <int> <int>
## 1     3     1
## 2     3    27
## 3     3    28
## # … with 140 more rows

2c. New Variables or Features

Calculate Node Degree

Using the mutate() function, we’ll calculate degree:

leader_measures <- leader_graph |>
  activate(nodes) |>
  mutate(in_degree = centrality_degree(mode = "in")) |>
  mutate(out_degree = centrality_degree(mode = "out")) |>
  mutate(degree = centrality_degree(mode = "total"))

leader_measures
## # A tbl_graph: 43 nodes and 143 edges
## #
## # A directed simple graph with 4 components
## #
## # Node Data: 43 × 8 (active)
##   id    efficacy trust district_site female in_degree out_degree degree
##   <chr>    <dbl> <dbl>         <dbl>  <dbl>     <dbl>      <dbl>  <dbl>
## 1 1         6.06  4                1      1         5          0      5
## 2 2         6.56  5.63             1      1         1          0      1
## 3 3         7.39  4.63             1      1         7          5     12
## 4 4         4.89  4                1      1         2          4      6
## 5 5         6.06  5.75             0      0         3          1      4
## 6 6         7.39  4.38             0      1         1          0      1
## # … with 37 more rows
## #
## # Edge Data: 143 × 2
##    from    to
##   <int> <int>
## 1     3     1
## 2     3    27
## 3     3    28
## # … with 140 more rows

Summarize Node Measures

node_measures <- leader_measures |> 
  activate(nodes) |>
  as_tibble()

node_measures
## # A tibble: 43 × 8
##    id    efficacy trust district_site female in_degree out_degree degree
##    <chr>    <dbl> <dbl>         <dbl>  <dbl>     <dbl>      <dbl>  <dbl>
##  1 1         6.06  4                1      1         5          0      5
##  2 2         6.56  5.63             1      1         1          0      1
##  3 3         7.39  4.63             1      1         7          5     12
##  4 4         4.89  4                1      1         2          4      6
##  5 5         6.06  5.75             0      0         3          1      4
##  6 6         7.39  4.38             0      1         1          0      1
##  7 7         5.56  3.63             0      0         2          2      4
##  8 8         7.5   5.63             1      0         8          7     15
##  9 9         7.67  5.25             0      1         3          2      5
## 10 10        6.64  4.78             0      1         5          7     12
## # … with 33 more rows
summary(node_measures)
##       id               efficacy         trust       district_site   
##  Length:43          Min.   :4.610   Min.   :3.630   Min.   :0.0000  
##  Class :character   1st Qu.:5.670   1st Qu.:4.130   1st Qu.:0.0000  
##  Mode  :character   Median :6.780   Median :4.780   Median :0.0000  
##                     Mean   :6.649   Mean   :4.783   Mean   :0.4186  
##                     3rd Qu.:7.470   3rd Qu.:5.440   3rd Qu.:1.0000  
##                     Max.   :8.500   Max.   :5.880   Max.   :1.0000  
##      female         in_degree       out_degree         degree      
##  Min.   :0.0000   Min.   :0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.: 1.000   1st Qu.: 3.000  
##  Median :1.0000   Median :3.000   Median : 2.000   Median : 4.000  
##  Mean   :0.5581   Mean   :3.326   Mean   : 3.326   Mean   : 6.651  
##  3rd Qu.:1.0000   3rd Qu.:5.000   3rd Qu.: 4.000   3rd Qu.: 9.500  
##  Max.   :1.0000   Max.   :8.000   Max.   :20.000   Max.   :27.000
skim(node_measures)
Data summary
Name node_measures
Number of rows 43
Number of columns 8
_______________________
Column type frequency:
character 1
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
id 0 1 1 2 0 43 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
efficacy 0 1 6.65 1.10 4.61 5.67 6.78 7.47 8.50 ▅▅▃▇▃
trust 0 1 4.78 0.71 3.63 4.13 4.78 5.44 5.88 ▆▆▅▆▇
district_site 0 1 0.42 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
female 0 1 0.56 0.50 0.00 0.00 1.00 1.00 1.00 ▆▁▁▁▇
in_degree 0 1 3.33 2.12 0.00 2.00 3.00 5.00 8.00 ▅▇▂▅▂
out_degree 0 1 3.33 4.26 0.00 1.00 2.00 4.00 20.00 ▇▁▁▁▁
degree 0 1 6.65 5.80 0.00 3.00 4.00 9.50 27.00 ▇▃▂▁▁

School/District-Level Stats

We can now calculate some basic summary stats:

node_measures |>
  group_by(district_site) |>
  summarise(n = n(),
            mean = mean(in_degree), 
            sd = sd(in_degree)
            )
## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  2.36  1.58
## 2             1    18  4.67  2.09
node_measures |>
  group_by(district_site) |>
  summarize(n = n(), mean = mean(out_degree), sd = sd(out_degree))
## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  2.44  3.01
## 2             1    18  4.56  5.41
node_measures |>
  group_by(district_site) |>
  summarise(n = n(),
            mean = mean(degree), 
            sd = sd(degree)
            )
## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  4.8   4.34
## 2             1    18  9.22  6.67
node_measures |>
  group_by(district_site) |>
  summarize(n = n(), mean = mean(trust), sd = sd(trust))
## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  4.90 0.719
## 2             1    18  4.62 0.688
node_measures |>
  group_by(district_site) |>
  summarize(n = n(), mean = mean(efficacy), sd = sd(efficacy))
## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  7.02 0.953
## 2             1    18  6.13 1.12

Now that we’ve completed our wrangling steps (and some exploration thrown in there as well), we’ll move on to our analysis!


3. Analyze

For our analysis, we’ll be incorporating the following:

  1. Clear descriptions of the exploratory techniques and/or models (e.g., level of analysis, community detection, ERGMs, etc.) you applied to analyze your data.

  2. Data visualizations that are attractive and easy to interpret, and include text or narrative for aiding their Interpretation by others.

  3. Interpretations/meanings of specific metrics (e.g., counts, means, centrality measures, etc.) that your analysis generated. 

3a. Techniques

This analysis uses exponential random graph models (ERGMS). As discussed previously, ERGMS, “provide a means to compare whether a network’s observed structural properties occur more frequently than you could expect from chance alone”.

GOF and MCMC Diagnostics are also used as checks on the models produced.

3b. Visualizations

Loading Network Data

leader_network <- as.network(leader_edges,
                             vertices = female_leader_nodes)

leader_network
##  Network attributes:
##   vertices = 43 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 143 
##     missing edges= 0 
##     non-missing edges= 143 
## 
##  Vertex attribute names: 
##     district_site efficacy female trust vertex.names 
## 
## No edge attributes
class(leader_network)
## [1] "network"

Network Structure Parameters

summary(leader_network ~ edges + mutual)
##  edges mutual 
##    143     36

Our summary tells us that this network is comprised of 143 edges and 36 reciprocated dyads.

set.seed(589)

ergm_mod_1 <-ergm(leader_network ~ edges + mutual)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2915.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0136.
## Convergence test p-value: 0.1862. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0049.
## Convergence test p-value: 0.0968. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0079.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_mod_1)
## Call:
## ergm(formula = leader_network ~ edges + mutual)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##        Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges   -3.1101     0.1283      0  -24.23   <1e-04 ***
## mutual   3.1246     0.3003      0   10.40   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 2503.6  on 1806  degrees of freedom
##  Residual Deviance:  891.8  on 1804  degrees of freedom
##  
## AIC: 895.8  BIC: 906.8  (Smaller is better. MC Std. Err. = 0.7447)

This first model indicates that the negative estimate for the edge parameter implies a relatively low probability of confident exchange. However, there is a strong tendency for confidential-exchange ties among reciprocated dyads.

*This is consistent with the Unit 4 Case Study, as to be expected.

summary(leader_network ~ edges + 
          mutual +
          transitive +
          gwesp(0.25, fixed=T))
##            edges           mutual       transitive gwesp.fixed.0.25 
##         143.0000          36.0000         202.0000         116.2971
ergm_mod_2 <-ergm(leader_network ~ edges + 
                    mutual +
                    gwesp(0.25, fixed=T))
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 3.2174.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 0.3242.
## The log-likelihood improved by 3.0555.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 0.7781.
## The log-likelihood improved by 4.1470.
## Estimating equations are not within tolerance region.
## Iteration 4 of at most 60:
## Optimizing with step length 0.3944.
## The log-likelihood improved by 4.2100.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 0.8221.
## The log-likelihood improved by 3.2676.
## Estimating equations are not within tolerance region.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2039.
## Estimating equations are not within tolerance region.
## Iteration 7 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0153.
## Convergence test p-value: 0.3954. Not converged with 99% confidence; increasing sample size.
## Iteration 8 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1348.
## Estimating equations are not within tolerance region.
## Iteration 9 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2347.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 10 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1104.
## Estimating equations are not within tolerance region.
## Iteration 11 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0612.
## Convergence test p-value: 0.0332. Not converged with 99% confidence; increasing sample size.
## Iteration 12 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0068.
## Convergence test p-value: 0.0489. Not converged with 99% confidence; increasing sample size.
## Iteration 13 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0022.
## Convergence test p-value: 0.0011. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_mod_2)
## Call:
## ergm(formula = leader_network ~ edges + mutual + gwesp(0.25, 
##     fixed = T))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                  Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges             -3.9745     0.1522      0 -26.117   <1e-04 ***
## mutual             2.2989     0.3171      0   7.250   <1e-04 ***
## gwesp.fixed.0.25   1.0713     0.1305      0   8.207   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 2503.6  on 1806  degrees of freedom
##  Residual Deviance:  805.7  on 1803  degrees of freedom
##  
## AIC: 811.7  BIC: 828.2  (Smaller is better. MC Std. Err. = 0.487)

In model 2, we see that the likelihood is more probable for mutuals with friend-of-a-friend ranking second in terms of probability.

*This is consistent with the Unit 4 Case Study.

Actor Attribute Parameters

*This step using “female” in place of “male” from the original Case Study.

We’re completing the step below to test whether those who are female or have higher efficacy scores are more likely to either send or receive a confidential exchange:

ergm_3 <- ergm(leader_network ~ edges +
                 mutual +
                 gwesp(0.25, fixed=T) +
                 nodefactor('female') +
                 nodecov('efficacy')
               )
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.9234.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.5308.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1665.
## Estimating equations are not within tolerance region.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.3306.
## Estimating equations are not within tolerance region.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.3752.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1653.
## Estimating equations are not within tolerance region.
## Iteration 7 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.5993.
## Estimating equations are not within tolerance region.
## Iteration 8 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.2377.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 9 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2780.
## Estimating equations are not within tolerance region.
## Iteration 10 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0781.
## Convergence test p-value: 0.3526. Not converged with 99% confidence; increasing sample size.
## Iteration 11 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0135.
## Convergence test p-value: 0.0505. Not converged with 99% confidence; increasing sample size.
## Iteration 12 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0158.
## Convergence test p-value: 0.0296. Not converged with 99% confidence; increasing sample size.
## Iteration 13 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0213.
## Convergence test p-value: 0.0033. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_3)
## Call:
## ergm(formula = leader_network ~ edges + mutual + gwesp(0.25, 
##     fixed = T) + nodefactor("female") + nodecov("efficacy"))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                     Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges               -4.16010    0.41421      0 -10.043   <1e-04 ***
## mutual               2.28685    0.31436      0   7.275   <1e-04 ***
## gwesp.fixed.0.25     1.05870    0.12629      0   8.383   <1e-04 ***
## nodefactor.female.1 -0.10837    0.06238      0  -1.737   0.0823 .  
## nodecov.efficacy     0.02332    0.02886      0   0.808   0.4190    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 2503.6  on 1806  degrees of freedom
##  Residual Deviance:  802.9  on 1801  degrees of freedom
##  
## AIC: 812.9  BIC: 840.4  (Smaller is better. MC Std. Err. = 0.609)

Reciprocity and transitivity are, again, the primary drivers of network formation.

*I will compare the estimates for female and the results for male in the Communicate section below (see Key Findings and Insights).

We’ll repeat the same process, swapping efficacy for district site now:

ergm_4 <- ergm(leader_network ~ edges +
                 mutual +
                 gwesp(0.25, fixed=T) +
                 nodematch('female') + 
                 nodecov('district_site')
                 
               )
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 0.7979.
## The log-likelihood improved by 2.8785.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.4235.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.7436.
## Estimating equations are not within tolerance region.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.4924.
## Estimating equations are not within tolerance region.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2838.
## Estimating equations are not within tolerance region.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1601.
## Estimating equations are not within tolerance region.
## Iteration 7 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.5384.
## Estimating equations are not within tolerance region.
## Iteration 8 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2468.
## Estimating equations are not within tolerance region.
## Iteration 9 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0515.
## Convergence test p-value: 0.9238. Not converged with 99% confidence; increasing sample size.
## Iteration 10 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0936.
## Convergence test p-value: 0.8751. Not converged with 99% confidence; increasing sample size.
## Iteration 11 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1402.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 12 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0839.
## Convergence test p-value: 0.2220. Not converged with 99% confidence; increasing sample size.
## Iteration 13 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0456.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_4)
## Call:
## ergm(formula = leader_network ~ edges + mutual + gwesp(0.25, 
##     fixed = T) + nodematch("female") + nodecov("district_site"))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##                       Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges                 -4.04848    0.17609      0 -22.991   <1e-04 ***
## mutual                 2.27964    0.30357      0   7.509   <1e-04 ***
## gwesp.fixed.0.25       1.03144    0.13404      0   7.695   <1e-04 ***
## nodematch.female      -0.14199    0.16571      0  -0.857   0.3915    
## nodecov.district_site  0.19151    0.06838      0   2.801   0.0051 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 2503.6  on 1806  degrees of freedom
##  Residual Deviance:  796.8  on 1801  degrees of freedom
##  
## AIC: 806.8  BIC: 834.3  (Smaller is better. MC Std. Err. = 0.3511)

In our final ERGM, district site poses more significance than gender for confidential exchange.

Check Model Fit

Next, we’ll test the goodness-of-fit using the gof() function:

ergm_3_gof <- gof(ergm_3)

plot(ergm_3_gof)

Check MCMC Diagnostics

One final check on our model is to run the mcmc.diagnostics() function:

mcmc.diagnostics(ergm_3)
## Sample statistics summary:
## 
## Iterations = 2854912:11546624
## Thinning interval = 4096 
## Number of chains = 1 
## Sample size per chain = 2123 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                        Mean     SD Naive SE Time-series SE
## edges                 7.831  47.97   1.0412          3.505
## mutual                2.558  15.77   0.3423          1.161
## gwesp.fixed.0.25      9.145  59.94   1.3008          4.352
## nodefactor.female.1   5.438  45.64   0.9905          2.970
## nodecov.efficacy    105.103 648.31  14.0704         47.241
## 
## 2. Quantiles for each variable:
## 
##                         2.5%     25%     50%    75%   97.5%
## edges                 -82.95  -27.00  10.000  43.00   98.95
## mutual                -27.00   -9.00   3.000  14.00   33.00
## gwesp.fixed.0.25      -98.62  -35.98   9.834  52.75  127.74
## nodefactor.female.1   -75.00  -28.50   5.000  37.00   98.95
## nodecov.efficacy    -1123.20 -360.88 130.920 582.16 1325.04
## 
## 
## Are sample statistics significantly different from observed?
##                 edges     mutual gwesp.fixed.0.25 nodefactor.female.1
## diff.      7.83089967 2.55817240       9.14492431          5.43758832
## test stat. 2.23443646 2.20265856       2.10130569          1.83081968
## P-val.     0.02545437 0.02761882       0.03561414          0.06712746
##            nodecov.efficacy Overall (Chi^2)
## diff.          105.10328780              NA
## test stat.       2.22483551     24.56835563
## P-val.           0.02609228      0.00020291
## 
## Sample statistics cross-correlations:
##                         edges    mutual gwesp.fixed.0.25 nodefactor.female.1
## edges               1.0000000 0.9751049        0.9892255           0.9338095
## mutual              0.9751049 1.0000000        0.9779126           0.9006552
## gwesp.fixed.0.25    0.9892255 0.9779126        1.0000000           0.9150100
## nodefactor.female.1 0.9338095 0.9006552        0.9150100           1.0000000
## nodecov.efficacy    0.9985599 0.9737674        0.9882132           0.9332912
##                     nodecov.efficacy
## edges                      0.9985599
## mutual                     0.9737674
## gwesp.fixed.0.25           0.9882132
## nodefactor.female.1        0.9332912
## nodecov.efficacy           1.0000000
## 
## Sample statistics auto-correlation:
## Chain 1 
##               edges    mutual gwesp.fixed.0.25 nodefactor.female.1
## Lag 0     1.0000000 1.0000000        1.0000000           1.0000000
## Lag 4096  0.8094984 0.8197609        0.8101566           0.7332629
## Lag 8192  0.6854380 0.6933812        0.6836885           0.5925429
## Lag 12288 0.5834218 0.5856105        0.5789001           0.4892685
## Lag 16384 0.4986569 0.4955591        0.4888562           0.4028109
## Lag 20480 0.4136814 0.4160483        0.4104782           0.3156256
##           nodecov.efficacy
## Lag 0            1.0000000
## Lag 4096         0.8096674
## Lag 8192         0.6846488
## Lag 12288        0.5821656
## Lag 16384        0.4963599
## Lag 20480        0.4114625
## 
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1 
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
##               edges              mutual    gwesp.fixed.0.25 nodefactor.female.1 
##             -0.2920             -0.2913             -0.2387             -0.7241 
##    nodecov.efficacy 
##             -0.3024 
## 
## Individual P-values (lower = worse):
##               edges              mutual    gwesp.fixed.0.25 nodefactor.female.1 
##           0.7702726           0.7708598           0.8113478           0.4689906 
##    nodecov.efficacy 
##           0.7623666 
## Joint P-value (lower = worse):  0.5526401 .

## 
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).

*I won’t be going in-depth about the MCMC diagnostic results except to say that they are slightly less balanced using the female sample than the female in terms of “trace” but are similar in terms of “density” in that they both favor left. Simply, there is room for improvement but they do not fail to converge.

4. COMMUNICATE

In the following section, we will cover the following:

  1. Key finding and insights in response to your research questions. 

  2. Suggests potential action that you and/or your target audience to better understand and improve the context in which your data product is intended to inform.  

  3. Discusses the limitations and any ethical/legal issues taken into consideration, including acknowledgments any reference materials used.

As a reminder, this analysis sought to address the following research question:

Are female school leaders more likely to confide in colleagues of their own gender than their male counterparts?

Using the Unit 4 Case Study for comparison, these are the results that this analysis has generated.

4a. Key Findings and Insights

In this section, I will be using the results from the Unit 4 Case Study ERGMS, to compare and contrast the findings from this analysis.

ERGM 3

From the Unit 4 Case Study, there was a slight but not statistically significant tendency towards male leaders being more likely to send or receive ties (0.10622). In comparison, this is even less for female school leaders (-0.10830); efficacy is a better predictor (0.02291) which is nearly identical to the efficacy return from the Unit 4 Case Study (0.2230).

ERMG 4

The results from the Unit 4 Case Study and this analysis are very similar with district site having more of an effect. The model estimates from Unit 4 were male (-0.15294) and district site (0.19149) while this analysis estimated female (-0.1491) and district site (0.2013).

Overall, reciprocity and transitivity proved the most statistically significant. While the Unit 4 Case Study and this analysis do generate similar estimates, there is a slightly more significant tendency towards male leaders being more likely to send or receive ties. To answer the research question guiding this analysis, female school leaders are less likely to confide in colleagues of the same gender than their male counterparts. However, there is not a drastic distinction between the two.

4b. Next Steps

One of the ways in which I could anticipate this analysis being used is to extend the scope of the results so that these predictive measures can be used in various educational settings. What I hoped to determine was how significant gender was in predicting confidential exchanges. In this context, we found that other attributes are more significant overall with a slight lean towards greater disclosure for male-to-male colleagues. We tested this against attributes such efficacy scores and district site. In the future, it would be great to include additional attributes that were not considered such as degree or other relational measures other than reciprocity and transitivity. In a similar vein to the Unit 5 Case Study, these predictive attributes could be used to improve employee retention and satisfaction especially in educational settings where individuals often have lower pay and resources available to them. (However, I think that this would also be very interesting to compare other industries as well!).

4c. Limitations

Consistent with the findings of the Unit 4 Case Study, reciprocity and transitivity are the two attributes with the most significant impact on confidential exchanges between school leaders. There are no ethical/legal issues that I foresee with this analysis as long as anonymity is maintained and this information or concepts explored are not used to exploit individuals.

References

Carolan, Brian. 2014. “Social Network Analysis and Education: Theory, Methods & Applications.” https://doi.org/10.4135/9781452270104.