For the Final Project, I am revisiting the work of Alan Daly and colleagues that is described in Chapter 9: Network Data and Statistical Models from Social Network Analysis and Education (Carolan 2014). In this case study, I expand on our previous inquiry into how school leaders select peers for collaboration or confidential exchanges.
More specifically, this case study will cover the following topics pertaining to each data-intensive workflow process:
Prepare: Prior to analysis, we’ll revisit where our data is coming from and share the research question(s) guiding this analysis.
Wrangle: In Section 2, we will use this space to transform our data so that we have distinct datasets based on gender and replicating the research process we have completed in the Unit 4 Case Study using our new dataset for a comparative analysis.
Analyze: We wrap up our analysis in Section 4 by incorporating exponential random graph modules (ERGM) used in Chapter 9 of Carolan (2014) to check how this model compares to the results of our previous ERGM.
Communicate: Finally, we’ll select, polish, and narrate a key finding from our analysis that helps to answer or research question(s).
In Unit 4, we examined the the level of influence of gender or other individual attributes in predicting confidential exchanges between school leaders. I continued this inquiry in my independent analysis to isolate male-to-male interactions. However, I have yet to compare and contrast these results to female school leaders. Specifically, this analysis will try to answer the following research question:
Are female school leaders more likely to confide in colleagues of their own gender than their male counterparts?
This analysis utilizes the School Leaders data mentioned above. In Chapter 9, Carolan (2014) describes this data by saying:
“These ideas and tools will be demonstrated using a portion of the School Leaders data set, particularly observations in years 1 and 3 of the”collaboration” and “confidential help” networks. Collaboration is a directed valued network measured on five-point scale ranging from 0 to 4, with higher values indicating more frequent collaborations (1–2 times/week). Similarly, the confidential help network is a directed, valued network measured on the same scale. Complete network data are available on all 43 school leaders drawn from two large school districts. In addition to these relational data, the survey instrument in year 1 captured attribute data, including composite scores from a set of items related to efficacy (19 items) and trust (8 items), as well as information about leaders’ gender and whether they worked at the district or school level”.
While this project is something I am generally curious about, it could provide valuable insights into the differences between disclosure between same-sex colleagues. With that being said, I envision this analysis would benefit those within the network themselves (or those in comparable environments) as well as those in higher leadership positions. While this analysis takes place in the context of education, it could be used as a model for measuring other industries as well to predict and analyze the effect of confidential exchanges in the workplace between same-sex colleagues.
To begin, load the the following packages that will be using for our analysis of network influence:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
## ✔ tibble 3.1.6 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(readxl)
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
##
## compose, simplify
## The following object is masked from 'package:tidyr':
##
## crossing
## The following object is masked from 'package:tibble':
##
## as_data_frame
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
library(tidygraph)
##
## Attaching package: 'tidygraph'
## The following object is masked from 'package:igraph':
##
## groups
## The following object is masked from 'package:stats':
##
## filter
library(ggraph)
library(skimr)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(statnet)
## Loading required package: tergm
## Loading required package: ergm
## Loading required package: network
##
## 'network' 1.17.1 (2021-06-12), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information
##
## Attaching package: 'network'
## The following objects are masked from 'package:igraph':
##
## %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
## get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
## is.directed, list.edge.attributes, list.vertex.attributes,
## set.edge.attribute, set.vertex.attribute
##
## 'ergm' 4.1.2 (2021-07-26), part of the Statnet Project
## * 'news(package="ergm")' for changes since last version
## * 'citation("ergm")' for citation information
## * 'https://statnet.org' for help, support, and other information
## 'ergm' 4 is a major update that introduces some backwards-incompatible
## changes. Please type 'news(package="ergm")' for a list of major
## changes.
## Loading required package: networkDynamic
##
## 'networkDynamic' 0.11.1 (2022-04-04), part of the Statnet Project
## * 'news(package="networkDynamic")' for changes since last version
## * 'citation("networkDynamic")' for citation information
## * 'https://statnet.org' for help, support, and other information
## Registered S3 method overwritten by 'tergm':
## method from
## simulate_formula.network ergm
##
## 'tergm' 4.0.2 (2021-07-28), part of the Statnet Project
## * 'news(package="tergm")' for changes since last version
## * 'citation("tergm")' for citation information
## * 'https://statnet.org' for help, support, and other information
##
## Attaching package: 'tergm'
## The following object is masked from 'package:ergm':
##
## snctrl
## Loading required package: ergm.count
##
## 'ergm.count' 4.0.2 (2021-06-18), part of the Statnet Project
## * 'news(package="ergm.count")' for changes since last version
## * 'citation("ergm.count")' for citation information
## * 'https://statnet.org' for help, support, and other information
## Loading required package: sna
## Loading required package: statnet.common
##
## Attaching package: 'statnet.common'
## The following object is masked from 'package:ergm':
##
## snctrl
## The following objects are masked from 'package:base':
##
## attr, order
## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## For citation information, type citation("sna").
## Type help(package="sna") to get started.
##
## Attaching package: 'sna'
## The following objects are masked from 'package:igraph':
##
## betweenness, bonpow, closeness, components, degree, dyad.census,
## evcent, hierarchy, is.connected, neighborhood, triad.census
## Loading required package: tsna
##
## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
## * 'news(package="statnet")' for changes since last version
## * 'citation("statnet")' for citation information
## * 'https://statnet.org' for help, support, and other information
For our data wrangling, we’ll focus on working with network data stored as an adjacency matrix. Our primary goals for this section are as follows:
Data preprocessing. In this section, we will replicate the steps from the Unit 4 case study.
Data transformations. Finally, we’ll mutate our matrix to include only female school leaders that will then be used to convert to an edge-list and store both our edges and node attributes as a network igraph object in preparation for analysis.
New variables or features. Once preprocessing is completed, we will follow the steps to move from a valued matrix to a binary matrix.
We already have access to the School Leaders Data Chapter 9_d and School Leaders Data Chapter 9_e from Unit 4.
To get started, we’ll use the read_xl() function to
import our data. For coherence and cohesion, we will adopt the same
naming conventions used in Unit 4 and will make note of any instances
when this does not occur (e.g., new variables and data transformations
that did not occur in the Unit 4 Case Study):
leader_matrix <- read_excel("data/School Leaders Data Chapter 9_d.xlsx",
col_names = FALSE)
## New names:
## • `` -> `...1`
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
## • `` -> `...37`
## • `` -> `...38`
## • `` -> `...39`
## • `` -> `...40`
## • `` -> `...41`
## • `` -> `...42`
## • `` -> `...43`
leader_matrix
## # A tibble: 43 × 43
## ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 3 1 0 1 1 1 1 2 1 1 1 1 1
## 4 1 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 1 0 3 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0
## 7 2 1 1 1 1 1 4 1 1 3 1 1 1
## 8 1 1 4 1 1 1 1 0 2 3 1 1 1
## 9 1 0 0 0 4 0 0 1 0 0 0 0 0
## 10 1 1 3 1 1 1 4 4 1 0 3 1 1
## # … with 33 more rows, and 30 more variables: ...14 <dbl>, ...15 <dbl>,
## # ...16 <dbl>, ...17 <dbl>, ...18 <dbl>, ...19 <dbl>, ...20 <dbl>,
## # ...21 <dbl>, ...22 <dbl>, ...23 <dbl>, ...24 <dbl>, ...25 <dbl>,
## # ...26 <dbl>, ...27 <dbl>, ...28 <dbl>, ...29 <dbl>, ...30 <dbl>,
## # ...31 <dbl>, ...32 <dbl>, ...33 <dbl>, ...34 <dbl>, ...35 <dbl>,
## # ...36 <dbl>, ...37 <dbl>, ...38 <dbl>, ...39 <dbl>, ...40 <dbl>,
## # ...41 <dbl>, ...42 <dbl>, ...43 <dbl>
leader_nodes <- read_excel("data/School Leaders Data Chapter 9_e.xlsx",
col_types = c("text", "numeric", "numeric", "numeric", "numeric")) |>
clean_names()
leader_nodes
## # A tibble: 43 × 5
## id efficacy trust district_site male
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 6.06 4 1 0
## 2 2 6.56 5.63 1 0
## 3 3 7.39 4.63 1 0
## 4 4 4.89 4 1 0
## 5 5 6.06 5.75 0 1
## 6 6 7.39 4.38 0 0
## 7 7 5.56 3.63 0 1
## 8 8 7.5 5.63 1 1
## 9 9 7.67 5.25 0 0
## 10 10 6.64 4.78 0 0
## # … with 33 more rows
*This step was not included in the Unit 4 Case Study.
Using the mutate() , case_when() and
select() functions, we’re creating a column for female
values and removing the male values:
female_leader_nodes <- leader_nodes |>
mutate(female = case_when(male >= 1 ~ '0', TRUE ~ '1')) |>
mutate(female = as.double(female)) |>
select(id, efficacy, trust, district_site, female)
female_leader_nodes
## # A tibble: 43 × 5
## id efficacy trust district_site female
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 6.06 4 1 1
## 2 2 6.56 5.63 1 1
## 3 3 7.39 4.63 1 1
## 4 4 4.89 4 1 1
## 5 5 6.06 5.75 0 0
## 6 6 7.39 4.38 0 1
## 7 7 5.56 3.63 0 0
## 8 8 7.5 5.63 1 0
## 9 9 7.67 5.25 0 1
## 10 10 6.64 4.78 0 1
## # … with 33 more rows
In this next step, we will convert a valued matrix into a binary matrix:
leader_matrix <- leader_matrix |>
as.matrix()
class(leader_matrix)
## [1] "matrix" "array"
To dichotomize our matrix, the following code will “assign” 0’s to all values in our matrix that are less than or equal to 2, and 1’s to all values that are greater the or equal to 3:
leader_matrix[leader_matrix <= 2] <- 0
leader_matrix[leader_matrix >= 3] <- 1
Before we can convert to an edgelist, we will also need to add the names of our nodes to the columns and rows of our matrix:
rownames(leader_matrix) <- female_leader_nodes$id
colnames(leader_matrix) <- female_leader_nodes$id
leader_matrix
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
## 5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## 8 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1
## 9 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 10 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
## 11 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
## 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 13 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 14 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
## 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 21 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
## 22 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1
## 23 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## 25 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## 28 1 1 1 1 0 1 0 1 1 1 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 1
## 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
## 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
## 36 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1
## 37 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 38 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1
## 39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 42 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0
## 43 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0
## 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
## 4 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
## 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 10 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## 11 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 13 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0
## 14 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
## 15 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1
## 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 18 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## 19 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 21 0 0 0 0 1 0 1 0 1 1 0 0 0 0 1
## 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 24 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 26 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## 27 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
## 28 1 0 0 1 0 1 0 1 1 1 0 0 0 0 0
## 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 34 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 36 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
## 37 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## 38 0 0 0 0 1 1 0 1 1 0 1 0 0 0 1
## 39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 41 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
## 42 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
## 43 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0
First, we need to convert our matrix to an igraph network object:
adjacency_matrix <- graph.adjacency(leader_matrix,
diag = FALSE)
class(adjacency_matrix)
## [1] "igraph"
adjacency_matrix
## IGRAPH 354ac3d DN-- 43 143 --
## + attr: name (v/c)
## + edges from 354ac3d (vertex names):
## [1] 3 ->1 3 ->27 3 ->28 3 ->36 3 ->38 4 ->16 4 ->28 4 ->34 4 ->36 5 ->9
## [11] 7 ->10 7 ->20 8 ->3 8 ->10 8 ->25 8 ->27 8 ->28 8 ->37 8 ->38 9 ->5
## [21] 9 ->14 10->3 10->7 10->8 10->11 10->18 10->20 10->38 11->10 11->21
## [31] 11->23 11->35 13->8 13->35 13->42 14->37 15->21 15->39 15->42 15->43
## [41] 16->22 17->26 18->29 18->34 19->31 20->7 20->10 21->1 21->11 21->12
## [51] 21->15 21->33 21->35 21->37 21->38 21->43 22->8 22->16 22->28 23->5
## [61] 23->11 24->27 24->42 25->3 25->8 26->17 26->41 27->24 27->32 27->42
## [71] 28->1 28->2 28->3 28->4 28->6 28->8 28->9 28->10 28->13 28->18
## + ... omitted several edges
Now we can use the get.data.frame() function to covert
our matrix to a standard edge-list:
leader_edges <- get.data.frame(adjacency_matrix) |>
mutate(from = as.character(from)) |>
mutate(to = as.character(to))
leader_edges
## from to
## 1 3 1
## 2 3 27
## 3 3 28
## 4 3 36
## 5 3 38
## 6 4 16
## 7 4 28
## 8 4 34
## 9 4 36
## 10 5 9
## 11 7 10
## 12 7 20
## 13 8 3
## 14 8 10
## 15 8 25
## 16 8 27
## 17 8 28
## 18 8 37
## 19 8 38
## 20 9 5
## 21 9 14
## 22 10 3
## 23 10 7
## 24 10 8
## 25 10 11
## 26 10 18
## 27 10 20
## 28 10 38
## 29 11 10
## 30 11 21
## 31 11 23
## 32 11 35
## 33 13 8
## 34 13 35
## 35 13 42
## 36 14 37
## 37 15 21
## 38 15 39
## 39 15 42
## 40 15 43
## 41 16 22
## 42 17 26
## 43 18 29
## 44 18 34
## 45 19 31
## 46 20 7
## 47 20 10
## 48 21 1
## 49 21 11
## 50 21 12
## 51 21 15
## 52 21 33
## 53 21 35
## 54 21 37
## 55 21 38
## 56 21 43
## 57 22 8
## 58 22 16
## 59 22 28
## 60 23 5
## 61 23 11
## 62 24 27
## 63 24 42
## 64 25 3
## 65 25 8
## 66 26 17
## 67 26 41
## 68 27 24
## 69 27 32
## 70 27 42
## 71 28 1
## 72 28 2
## 73 28 3
## 74 28 4
## 75 28 6
## 76 28 8
## 77 28 9
## 78 28 10
## 79 28 13
## 80 28 18
## 81 28 21
## 82 28 22
## 83 28 25
## 84 28 27
## 85 28 29
## 86 28 32
## 87 28 34
## 88 28 36
## 89 28 37
## 90 28 38
## 91 31 19
## 92 32 24
## 93 33 21
## 94 34 18
## 95 34 22
## 96 34 28
## 97 34 29
## 98 35 21
## 99 36 1
## 100 36 3
## 101 36 4
## 102 36 8
## 103 36 16
## 104 36 18
## 105 36 26
## 106 36 27
## 107 36 28
## 108 36 34
## 109 37 3
## 110 37 14
## 111 37 38
## 112 38 1
## 113 38 3
## 114 38 8
## 115 38 11
## 116 38 18
## 117 38 22
## 118 38 26
## 119 38 27
## 120 38 28
## 121 38 33
## 122 38 34
## 123 38 36
## 124 38 37
## 125 38 39
## 126 38 43
## 127 42 9
## 128 42 15
## 129 42 24
## 130 42 27
## 131 42 43
## 132 43 5
## 133 43 8
## 134 43 15
## 135 43 18
## 136 43 21
## 137 43 22
## 138 43 23
## 139 43 32
## 140 43 33
## 141 43 34
## 142 43 37
## 143 43 42
Now, we will convert our data frames into a network graph object:
leader_graph <- tbl_graph(edges = leader_edges,
nodes = female_leader_nodes,
directed = TRUE)
leader_graph
## # A tbl_graph: 43 nodes and 143 edges
## #
## # A directed simple graph with 4 components
## #
## # Node Data: 43 × 5 (active)
## id efficacy trust district_site female
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 6.06 4 1 1
## 2 2 6.56 5.63 1 1
## 3 3 7.39 4.63 1 1
## 4 4 4.89 4 1 1
## 5 5 6.06 5.75 0 0
## 6 6 7.39 4.38 0 1
## # … with 37 more rows
## #
## # Edge Data: 143 × 2
## from to
## <int> <int>
## 1 3 1
## 2 3 27
## 3 3 28
## # … with 140 more rows
Using the mutate() function, we’ll calculate degree:
leader_measures <- leader_graph |>
activate(nodes) |>
mutate(in_degree = centrality_degree(mode = "in")) |>
mutate(out_degree = centrality_degree(mode = "out")) |>
mutate(degree = centrality_degree(mode = "total"))
leader_measures
## # A tbl_graph: 43 nodes and 143 edges
## #
## # A directed simple graph with 4 components
## #
## # Node Data: 43 × 8 (active)
## id efficacy trust district_site female in_degree out_degree degree
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 6.06 4 1 1 5 0 5
## 2 2 6.56 5.63 1 1 1 0 1
## 3 3 7.39 4.63 1 1 7 5 12
## 4 4 4.89 4 1 1 2 4 6
## 5 5 6.06 5.75 0 0 3 1 4
## 6 6 7.39 4.38 0 1 1 0 1
## # … with 37 more rows
## #
## # Edge Data: 143 × 2
## from to
## <int> <int>
## 1 3 1
## 2 3 27
## 3 3 28
## # … with 140 more rows
node_measures <- leader_measures |>
activate(nodes) |>
as_tibble()
node_measures
## # A tibble: 43 × 8
## id efficacy trust district_site female in_degree out_degree degree
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 6.06 4 1 1 5 0 5
## 2 2 6.56 5.63 1 1 1 0 1
## 3 3 7.39 4.63 1 1 7 5 12
## 4 4 4.89 4 1 1 2 4 6
## 5 5 6.06 5.75 0 0 3 1 4
## 6 6 7.39 4.38 0 1 1 0 1
## 7 7 5.56 3.63 0 0 2 2 4
## 8 8 7.5 5.63 1 0 8 7 15
## 9 9 7.67 5.25 0 1 3 2 5
## 10 10 6.64 4.78 0 1 5 7 12
## # … with 33 more rows
summary(node_measures)
## id efficacy trust district_site
## Length:43 Min. :4.610 Min. :3.630 Min. :0.0000
## Class :character 1st Qu.:5.670 1st Qu.:4.130 1st Qu.:0.0000
## Mode :character Median :6.780 Median :4.780 Median :0.0000
## Mean :6.649 Mean :4.783 Mean :0.4186
## 3rd Qu.:7.470 3rd Qu.:5.440 3rd Qu.:1.0000
## Max. :8.500 Max. :5.880 Max. :1.0000
## female in_degree out_degree degree
## Min. :0.0000 Min. :0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.: 1.000 1st Qu.: 3.000
## Median :1.0000 Median :3.000 Median : 2.000 Median : 4.000
## Mean :0.5581 Mean :3.326 Mean : 3.326 Mean : 6.651
## 3rd Qu.:1.0000 3rd Qu.:5.000 3rd Qu.: 4.000 3rd Qu.: 9.500
## Max. :1.0000 Max. :8.000 Max. :20.000 Max. :27.000
skim(node_measures)
| Name | node_measures |
| Number of rows | 43 |
| Number of columns | 8 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 7 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 1 | 2 | 0 | 43 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| efficacy | 0 | 1 | 6.65 | 1.10 | 4.61 | 5.67 | 6.78 | 7.47 | 8.50 | ▅▅▃▇▃ |
| trust | 0 | 1 | 4.78 | 0.71 | 3.63 | 4.13 | 4.78 | 5.44 | 5.88 | ▆▆▅▆▇ |
| district_site | 0 | 1 | 0.42 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
| female | 0 | 1 | 0.56 | 0.50 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▆▁▁▁▇ |
| in_degree | 0 | 1 | 3.33 | 2.12 | 0.00 | 2.00 | 3.00 | 5.00 | 8.00 | ▅▇▂▅▂ |
| out_degree | 0 | 1 | 3.33 | 4.26 | 0.00 | 1.00 | 2.00 | 4.00 | 20.00 | ▇▁▁▁▁ |
| degree | 0 | 1 | 6.65 | 5.80 | 0.00 | 3.00 | 4.00 | 9.50 | 27.00 | ▇▃▂▁▁ |
We can now calculate some basic summary stats:
node_measures |>
group_by(district_site) |>
summarise(n = n(),
mean = mean(in_degree),
sd = sd(in_degree)
)
## # A tibble: 2 × 4
## district_site n mean sd
## <dbl> <int> <dbl> <dbl>
## 1 0 25 2.36 1.58
## 2 1 18 4.67 2.09
node_measures |>
group_by(district_site) |>
summarize(n = n(), mean = mean(out_degree), sd = sd(out_degree))
## # A tibble: 2 × 4
## district_site n mean sd
## <dbl> <int> <dbl> <dbl>
## 1 0 25 2.44 3.01
## 2 1 18 4.56 5.41
node_measures |>
group_by(district_site) |>
summarise(n = n(),
mean = mean(degree),
sd = sd(degree)
)
## # A tibble: 2 × 4
## district_site n mean sd
## <dbl> <int> <dbl> <dbl>
## 1 0 25 4.8 4.34
## 2 1 18 9.22 6.67
node_measures |>
group_by(district_site) |>
summarize(n = n(), mean = mean(trust), sd = sd(trust))
## # A tibble: 2 × 4
## district_site n mean sd
## <dbl> <int> <dbl> <dbl>
## 1 0 25 4.90 0.719
## 2 1 18 4.62 0.688
node_measures |>
group_by(district_site) |>
summarize(n = n(), mean = mean(efficacy), sd = sd(efficacy))
## # A tibble: 2 × 4
## district_site n mean sd
## <dbl> <int> <dbl> <dbl>
## 1 0 25 7.02 0.953
## 2 1 18 6.13 1.12
Now that we’ve completed our wrangling steps (and some exploration thrown in there as well), we’ll move on to our analysis!
For our analysis, we’ll be incorporating the following:
Clear descriptions of the exploratory techniques and/or models (e.g., level of analysis, community detection, ERGMs, etc.) you applied to analyze your data.
Data visualizations that are attractive and easy to interpret, and include text or narrative for aiding their Interpretation by others.
Interpretations/meanings of specific metrics (e.g., counts, means, centrality measures, etc.) that your analysis generated.
This analysis uses exponential random graph models (ERGMS). As discussed previously, ERGMS, “provide a means to compare whether a network’s observed structural properties occur more frequently than you could expect from chance alone”.
GOF and MCMC Diagnostics are also used as checks on the models produced.
leader_network <- as.network(leader_edges,
vertices = female_leader_nodes)
leader_network
## Network attributes:
## vertices = 43
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 143
## missing edges= 0
## non-missing edges= 143
##
## Vertex attribute names:
## district_site efficacy female trust vertex.names
##
## No edge attributes
class(leader_network)
## [1] "network"
summary(leader_network ~ edges + mutual)
## edges mutual
## 143 36
Our summary tells us that this network is comprised of 143 edges and 36 reciprocated dyads.
set.seed(589)
ergm_mod_1 <-ergm(leader_network ~ edges + mutual)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2915.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0136.
## Convergence test p-value: 0.1862. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0049.
## Convergence test p-value: 0.0968. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0079.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_mod_1)
## Call:
## ergm(formula = leader_network ~ edges + mutual)
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.1101 0.1283 0 -24.23 <1e-04 ***
## mutual 3.1246 0.3003 0 10.40 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 2503.6 on 1806 degrees of freedom
## Residual Deviance: 891.8 on 1804 degrees of freedom
##
## AIC: 895.8 BIC: 906.8 (Smaller is better. MC Std. Err. = 0.7447)
This first model indicates that the negative estimate for the edge parameter implies a relatively low probability of confident exchange. However, there is a strong tendency for confidential-exchange ties among reciprocated dyads.
*This is consistent with the Unit 4 Case Study, as to be expected.
summary(leader_network ~ edges +
mutual +
transitive +
gwesp(0.25, fixed=T))
## edges mutual transitive gwesp.fixed.0.25
## 143.0000 36.0000 202.0000 116.2971
ergm_mod_2 <-ergm(leader_network ~ edges +
mutual +
gwesp(0.25, fixed=T))
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 3.2174.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 0.3242.
## The log-likelihood improved by 3.0555.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 0.7781.
## The log-likelihood improved by 4.1470.
## Estimating equations are not within tolerance region.
## Iteration 4 of at most 60:
## Optimizing with step length 0.3944.
## The log-likelihood improved by 4.2100.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 0.8221.
## The log-likelihood improved by 3.2676.
## Estimating equations are not within tolerance region.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2039.
## Estimating equations are not within tolerance region.
## Iteration 7 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0153.
## Convergence test p-value: 0.3954. Not converged with 99% confidence; increasing sample size.
## Iteration 8 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1348.
## Estimating equations are not within tolerance region.
## Iteration 9 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2347.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 10 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1104.
## Estimating equations are not within tolerance region.
## Iteration 11 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0612.
## Convergence test p-value: 0.0332. Not converged with 99% confidence; increasing sample size.
## Iteration 12 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0068.
## Convergence test p-value: 0.0489. Not converged with 99% confidence; increasing sample size.
## Iteration 13 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0022.
## Convergence test p-value: 0.0011. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_mod_2)
## Call:
## ergm(formula = leader_network ~ edges + mutual + gwesp(0.25,
## fixed = T))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.9745 0.1522 0 -26.117 <1e-04 ***
## mutual 2.2989 0.3171 0 7.250 <1e-04 ***
## gwesp.fixed.0.25 1.0713 0.1305 0 8.207 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 2503.6 on 1806 degrees of freedom
## Residual Deviance: 805.7 on 1803 degrees of freedom
##
## AIC: 811.7 BIC: 828.2 (Smaller is better. MC Std. Err. = 0.487)
In model 2, we see that the likelihood is more probable for mutuals with friend-of-a-friend ranking second in terms of probability.
*This is consistent with the Unit 4 Case Study.
*This step using “female” in place of “male” from the original Case Study.
We’re completing the step below to test whether those who are female or have higher efficacy scores are more likely to either send or receive a confidential exchange:
ergm_3 <- ergm(leader_network ~ edges +
mutual +
gwesp(0.25, fixed=T) +
nodefactor('female') +
nodecov('efficacy')
)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.9234.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.5308.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1665.
## Estimating equations are not within tolerance region.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.3306.
## Estimating equations are not within tolerance region.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.3752.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1653.
## Estimating equations are not within tolerance region.
## Iteration 7 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.5993.
## Estimating equations are not within tolerance region.
## Iteration 8 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.2377.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 9 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2780.
## Estimating equations are not within tolerance region.
## Iteration 10 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0781.
## Convergence test p-value: 0.3526. Not converged with 99% confidence; increasing sample size.
## Iteration 11 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0135.
## Convergence test p-value: 0.0505. Not converged with 99% confidence; increasing sample size.
## Iteration 12 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0158.
## Convergence test p-value: 0.0296. Not converged with 99% confidence; increasing sample size.
## Iteration 13 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0213.
## Convergence test p-value: 0.0033. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_3)
## Call:
## ergm(formula = leader_network ~ edges + mutual + gwesp(0.25,
## fixed = T) + nodefactor("female") + nodecov("efficacy"))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -4.16010 0.41421 0 -10.043 <1e-04 ***
## mutual 2.28685 0.31436 0 7.275 <1e-04 ***
## gwesp.fixed.0.25 1.05870 0.12629 0 8.383 <1e-04 ***
## nodefactor.female.1 -0.10837 0.06238 0 -1.737 0.0823 .
## nodecov.efficacy 0.02332 0.02886 0 0.808 0.4190
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 2503.6 on 1806 degrees of freedom
## Residual Deviance: 802.9 on 1801 degrees of freedom
##
## AIC: 812.9 BIC: 840.4 (Smaller is better. MC Std. Err. = 0.609)
Reciprocity and transitivity are, again, the primary drivers of network formation.
*I will compare the estimates for female and the results for male in the Communicate section below (see Key Findings and Insights).
We’ll repeat the same process, swapping efficacy for district site now:
ergm_4 <- ergm(leader_network ~ edges +
mutual +
gwesp(0.25, fixed=T) +
nodematch('female') +
nodecov('district_site')
)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 0.7979.
## The log-likelihood improved by 2.8785.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.4235.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.7436.
## Estimating equations are not within tolerance region.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.4924.
## Estimating equations are not within tolerance region.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2838.
## Estimating equations are not within tolerance region.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1601.
## Estimating equations are not within tolerance region.
## Iteration 7 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.5384.
## Estimating equations are not within tolerance region.
## Iteration 8 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2468.
## Estimating equations are not within tolerance region.
## Iteration 9 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0515.
## Convergence test p-value: 0.9238. Not converged with 99% confidence; increasing sample size.
## Iteration 10 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0936.
## Convergence test p-value: 0.8751. Not converged with 99% confidence; increasing sample size.
## Iteration 11 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1402.
## Estimating equations are not within tolerance region.
## Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size.
## Iteration 12 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0839.
## Convergence test p-value: 0.2220. Not converged with 99% confidence; increasing sample size.
## Iteration 13 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0456.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_4)
## Call:
## ergm(formula = leader_network ~ edges + mutual + gwesp(0.25,
## fixed = T) + nodematch("female") + nodecov("district_site"))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -4.04848 0.17609 0 -22.991 <1e-04 ***
## mutual 2.27964 0.30357 0 7.509 <1e-04 ***
## gwesp.fixed.0.25 1.03144 0.13404 0 7.695 <1e-04 ***
## nodematch.female -0.14199 0.16571 0 -0.857 0.3915
## nodecov.district_site 0.19151 0.06838 0 2.801 0.0051 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 2503.6 on 1806 degrees of freedom
## Residual Deviance: 796.8 on 1801 degrees of freedom
##
## AIC: 806.8 BIC: 834.3 (Smaller is better. MC Std. Err. = 0.3511)
In our final ERGM, district site poses more significance than gender for confidential exchange.
Next, we’ll test the goodness-of-fit using the gof()
function:
ergm_3_gof <- gof(ergm_3)
plot(ergm_3_gof)
One final check on our model is to run the
mcmc.diagnostics() function:
mcmc.diagnostics(ergm_3)
## Sample statistics summary:
##
## Iterations = 2854912:11546624
## Thinning interval = 4096
## Number of chains = 1
## Sample size per chain = 2123
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## edges 7.831 47.97 1.0412 3.505
## mutual 2.558 15.77 0.3423 1.161
## gwesp.fixed.0.25 9.145 59.94 1.3008 4.352
## nodefactor.female.1 5.438 45.64 0.9905 2.970
## nodecov.efficacy 105.103 648.31 14.0704 47.241
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## edges -82.95 -27.00 10.000 43.00 98.95
## mutual -27.00 -9.00 3.000 14.00 33.00
## gwesp.fixed.0.25 -98.62 -35.98 9.834 52.75 127.74
## nodefactor.female.1 -75.00 -28.50 5.000 37.00 98.95
## nodecov.efficacy -1123.20 -360.88 130.920 582.16 1325.04
##
##
## Are sample statistics significantly different from observed?
## edges mutual gwesp.fixed.0.25 nodefactor.female.1
## diff. 7.83089967 2.55817240 9.14492431 5.43758832
## test stat. 2.23443646 2.20265856 2.10130569 1.83081968
## P-val. 0.02545437 0.02761882 0.03561414 0.06712746
## nodecov.efficacy Overall (Chi^2)
## diff. 105.10328780 NA
## test stat. 2.22483551 24.56835563
## P-val. 0.02609228 0.00020291
##
## Sample statistics cross-correlations:
## edges mutual gwesp.fixed.0.25 nodefactor.female.1
## edges 1.0000000 0.9751049 0.9892255 0.9338095
## mutual 0.9751049 1.0000000 0.9779126 0.9006552
## gwesp.fixed.0.25 0.9892255 0.9779126 1.0000000 0.9150100
## nodefactor.female.1 0.9338095 0.9006552 0.9150100 1.0000000
## nodecov.efficacy 0.9985599 0.9737674 0.9882132 0.9332912
## nodecov.efficacy
## edges 0.9985599
## mutual 0.9737674
## gwesp.fixed.0.25 0.9882132
## nodefactor.female.1 0.9332912
## nodecov.efficacy 1.0000000
##
## Sample statistics auto-correlation:
## Chain 1
## edges mutual gwesp.fixed.0.25 nodefactor.female.1
## Lag 0 1.0000000 1.0000000 1.0000000 1.0000000
## Lag 4096 0.8094984 0.8197609 0.8101566 0.7332629
## Lag 8192 0.6854380 0.6933812 0.6836885 0.5925429
## Lag 12288 0.5834218 0.5856105 0.5789001 0.4892685
## Lag 16384 0.4986569 0.4955591 0.4888562 0.4028109
## Lag 20480 0.4136814 0.4160483 0.4104782 0.3156256
## nodecov.efficacy
## Lag 0 1.0000000
## Lag 4096 0.8096674
## Lag 8192 0.6846488
## Lag 12288 0.5821656
## Lag 16384 0.4963599
## Lag 20480 0.4114625
##
## Sample statistics burn-in diagnostic (Geweke):
## Chain 1
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## edges mutual gwesp.fixed.0.25 nodefactor.female.1
## -0.2920 -0.2913 -0.2387 -0.7241
## nodecov.efficacy
## -0.3024
##
## Individual P-values (lower = worse):
## edges mutual gwesp.fixed.0.25 nodefactor.female.1
## 0.7702726 0.7708598 0.8113478 0.4689906
## nodecov.efficacy
## 0.7623666
## Joint P-value (lower = worse): 0.5526401 .
##
## MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
*I won’t be going in-depth about the MCMC diagnostic results except to say that they are slightly less balanced using the female sample than the female in terms of “trace” but are similar in terms of “density” in that they both favor left. Simply, there is room for improvement but they do not fail to converge.
In the following section, we will cover the following:
Key finding and insights in response to your research questions.
Suggests potential action that you and/or your target audience to better understand and improve the context in which your data product is intended to inform.
Discusses the limitations and any ethical/legal issues taken into consideration, including acknowledgments any reference materials used.
As a reminder, this analysis sought to address the following research question:
Are female school leaders more likely to confide in colleagues of their own gender than their male counterparts?
Using the Unit 4 Case Study for comparison, these are the results that this analysis has generated.
In this section, I will be using the results from the Unit 4 Case Study ERGMS, to compare and contrast the findings from this analysis.
From the Unit 4 Case Study, there was a slight but not statistically significant tendency towards male leaders being more likely to send or receive ties (0.10622). In comparison, this is even less for female school leaders (-0.10830); efficacy is a better predictor (0.02291) which is nearly identical to the efficacy return from the Unit 4 Case Study (0.2230).
The results from the Unit 4 Case Study and this analysis are very similar with district site having more of an effect. The model estimates from Unit 4 were male (-0.15294) and district site (0.19149) while this analysis estimated female (-0.1491) and district site (0.2013).
Overall, reciprocity and transitivity proved the most statistically significant. While the Unit 4 Case Study and this analysis do generate similar estimates, there is a slightly more significant tendency towards male leaders being more likely to send or receive ties. To answer the research question guiding this analysis, female school leaders are less likely to confide in colleagues of the same gender than their male counterparts. However, there is not a drastic distinction between the two.
One of the ways in which I could anticipate this analysis being used is to extend the scope of the results so that these predictive measures can be used in various educational settings. What I hoped to determine was how significant gender was in predicting confidential exchanges. In this context, we found that other attributes are more significant overall with a slight lean towards greater disclosure for male-to-male colleagues. We tested this against attributes such efficacy scores and district site. In the future, it would be great to include additional attributes that were not considered such as degree or other relational measures other than reciprocity and transitivity. In a similar vein to the Unit 5 Case Study, these predictive attributes could be used to improve employee retention and satisfaction especially in educational settings where individuals often have lower pay and resources available to them. (However, I think that this would also be very interesting to compare other industries as well!).
Consistent with the findings of the Unit 4 Case Study, reciprocity and transitivity are the two attributes with the most significant impact on confidential exchanges between school leaders. There are no ethical/legal issues that I foresee with this analysis as long as anonymity is maintained and this information or concepts explored are not used to exploit individuals.
Carolan, Brian. 2014. “Social Network Analysis and Education: Theory, Methods & Applications.” https://doi.org/10.4135/9781452270104.