Unit 4 Analysis: Tie Prediction Based on District-Site and Gender

Identify a data source. For this assignment, I have selected the School Leaders Data from Chapter 9.

Formulate a question. I wanted to expand on my visualization from Unit 4 Case Study for this Independent Analysis. The research question that guided this analysis was:

Are school leaders more likely to confide in colleagues of their own gender or is district site a better indicator?

Analyze the data.

First, loading libraries.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readxl)
library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union

## The following objects are masked from 'package:purrr':
## 
##     compose, simplify

## The following object is masked from 'package:tidyr':
## 
##     crossing

## The following object is masked from 'package:tibble':
## 
##     as_data_frame

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

library(tidygraph)

## 
## Attaching package: 'tidygraph'

## The following object is masked from 'package:igraph':
## 
##     groups

## The following object is masked from 'package:stats':
## 
##     filter

library(ggraph)
library(skimr)
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(statnet)

## Loading required package: tergm

## Loading required package: ergm

## Loading required package: network

## 
## 'network' 1.17.1 (2021-06-12), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## Attaching package: 'network'

## The following objects are masked from 'package:igraph':
## 
##     %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
##     get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
##     is.directed, list.edge.attributes, list.vertex.attributes,
##     set.edge.attribute, set.vertex.attribute

## 
## 'ergm' 4.1.2 (2021-07-26), part of the Statnet Project
## * 'news(package="ergm")' for changes since last version
## * 'citation("ergm")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 'ergm' 4 is a major update that introduces some backwards-incompatible
## changes. Please type 'news(package="ergm")' for a list of major
## changes.

## Loading required package: networkDynamic

## 
## 'networkDynamic' 0.11.0 (2021-06-12), part of the Statnet Project
## * 'news(package="networkDynamic")' for changes since last version
## * 'citation("networkDynamic")' for citation information
## * 'https://statnet.org' for help, support, and other information

## Registered S3 method overwritten by 'tergm':
##   method                   from
##   simulate_formula.network ergm

## 
## 'tergm' 4.0.2 (2021-07-28), part of the Statnet Project
## * 'news(package="tergm")' for changes since last version
## * 'citation("tergm")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## Attaching package: 'tergm'

## The following object is masked from 'package:ergm':
## 
##     snctrl

## Loading required package: ergm.count

## 
## 'ergm.count' 4.0.2 (2021-06-18), part of the Statnet Project
## * 'news(package="ergm.count")' for changes since last version
## * 'citation("ergm.count")' for citation information
## * 'https://statnet.org' for help, support, and other information

## Loading required package: sna

## Loading required package: statnet.common

## 
## Attaching package: 'statnet.common'

## The following object is masked from 'package:ergm':
## 
##     snctrl

## The following objects are masked from 'package:base':
## 
##     attr, order

## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.

## 
## Attaching package: 'sna'

## The following objects are masked from 'package:igraph':
## 
##     betweenness, bonpow, closeness, components, degree, dyad.census,
##     evcent, hierarchy, is.connected, neighborhood, triad.census

## Loading required package: tsna

## 
## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
## * 'news(package="statnet")' for changes since last version
## * 'citation("statnet")' for citation information
## * 'https://statnet.org' for help, support, and other information

## 
## There are updates for the following statnet packages on CRAN:

##                Installed ReposVer Built  
## networkDynamic "0.11.0"  "0.11.1" "4.1.0"

## Restart R and use "statnet::update_statnet()" to get the updates.

Borrowing from the Case Study, School Leaders Data Chapter 9 is the data being used for this analysis. Loading E:

leader_nodes <- read_excel("data/School Leaders Data Chapter 9_e.xlsx", 
                           col_types = c("text", "numeric", "numeric", "numeric", "numeric")) |>
  clean_names()

leader_nodes

## # A tibble: 43 × 5
##    id    efficacy trust district_site  male
##    <chr>    <dbl> <dbl>         <dbl> <dbl>
##  1 1         6.06  4                1     0
##  2 2         6.56  5.63             1     0
##  3 3         7.39  4.63             1     0
##  4 4         4.89  4                1     0
##  5 5         6.06  5.75             0     1
##  6 6         7.39  4.38             0     0
##  7 7         5.56  3.63             0     1
##  8 8         7.5   5.63             1     1
##  9 9         7.67  5.25             0     0
## 10 10        6.64  4.78             0     0
## # … with 33 more rows

Loading D:

leader_matrix <- read_excel("data/School Leaders Data Chapter 9_d.xlsx", 
                           col_names = FALSE)

## New names:
## • `` -> `...1`
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
## • `` -> `...37`
## • `` -> `...38`
## • `` -> `...39`
## • `` -> `...40`
## • `` -> `...41`
## • `` -> `...42`
## • `` -> `...43`

leader_matrix

## # A tibble: 43 × 43
##     ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12 ...13
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     0     0     0     0     0     0     0     0     0     0     0     0     0
##  2     0     0     0     0     0     0     0     0     0     0     0     0     0
##  3     3     1     0     1     1     1     1     2     1     1     1     1     1
##  4     1     0     0     0     0     0     0     0     0     0     0     0     0
##  5     0     0     0     0     0     0     1     0     3     0     0     0     0
##  6     0     0     0     0     0     0     0     0     0     0     0     0     0
##  7     2     1     1     1     1     1     4     1     1     3     1     1     1
##  8     1     1     4     1     1     1     1     0     2     3     1     1     1
##  9     1     0     0     0     4     0     0     1     0     0     0     0     0
## 10     1     1     3     1     1     1     4     4     1     0     3     1     1
## # … with 33 more rows, and 30 more variables: ...14 <dbl>, ...15 <dbl>,
## #   ...16 <dbl>, ...17 <dbl>, ...18 <dbl>, ...19 <dbl>, ...20 <dbl>,
## #   ...21 <dbl>, ...22 <dbl>, ...23 <dbl>, ...24 <dbl>, ...25 <dbl>,
## #   ...26 <dbl>, ...27 <dbl>, ...28 <dbl>, ...29 <dbl>, ...30 <dbl>,
## #   ...31 <dbl>, ...32 <dbl>, ...33 <dbl>, ...34 <dbl>, ...35 <dbl>,
## #   ...36 <dbl>, ...37 <dbl>, ...38 <dbl>, ...39 <dbl>, ...40 <dbl>,
## #   ...41 <dbl>, ...42 <dbl>, ...43 <dbl>

Convert to matrix and verify class:

leader_matrix <- leader_matrix |>
  as.matrix()

class(leader_matrix)

## [1] "matrix" "array"

leader_matrix[leader_matrix <= 2] <- 0

leader_matrix[leader_matrix >= 3] <- 1

Adding row and column names:

rownames(leader_matrix) <- leader_nodes$id

colnames(leader_matrix) <- leader_nodes$id

leader_matrix

##    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## 1  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 2  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 3  1 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1
## 4  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1
## 5  0 0 0 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 6  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 7  0 0 0 0 0 0 1 0 0  1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0
## 8  0 0 1 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  1
## 9  0 0 0 0 1 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 10 0 0 1 0 0 0 1 1 0  0  1  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  0  0
## 11 0 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0  1  0  1  0  0  0  0  0
## 12 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 13 0 0 0 0 0 0 0 1 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 14 0 0 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 15 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
## 16 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0
## 17 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  1  0  0
## 18 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 19 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## 20 0 0 0 0 0 0 1 0 0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 21 1 0 0 0 0 0 0 0 0  0  1  1  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0
## 22 0 0 0 0 0 0 0 1 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1
## 23 0 0 0 0 1 0 0 0 0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 24 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
## 25 0 0 1 0 0 0 0 1 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 26 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0
## 27 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
## 28 1 1 1 1 0 1 0 1 1  1  0  0  1  0  0  0  0  1  0  0  1  1  0  0  1  0  1  1
## 29 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 30 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 31 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## 32 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
## 33 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
## 34 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  1
## 35 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
## 36 1 0 1 1 0 0 0 1 0  0  0  0  0  0  0  1  0  1  0  0  0  0  0  0  0  1  1  1
## 37 0 0 1 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 38 1 0 1 0 0 0 0 1 0  0  1  0  0  0  0  0  0  1  0  0  0  1  0  0  0  1  1  1
## 39 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 40 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 41 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 42 0 0 0 0 0 0 0 0 1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  1  0  0  1  0
## 43 0 0 0 0 1 0 0 1 0  0  0  0  0  0  1  0  0  1  0  0  1  1  1  0  0  0  0  0
##    29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
## 1   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 2   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 3   0  0  0  0  0  0  0  1  0  1  0  0  0  0  0
## 4   0  0  0  0  0  1  0  1  0  0  0  0  0  0  0
## 5   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 6   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 7   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 8   0  0  0  0  0  0  0  0  1  1  0  0  0  0  0
## 9   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 10  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
## 11  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0
## 12  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 13  0  0  0  0  0  0  1  0  0  0  0  0  0  1  0
## 14  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0
## 15  0  0  0  0  0  0  0  0  0  0  1  0  0  1  1
## 16  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 17  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 18  1  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## 19  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0
## 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 21  0  0  0  0  1  0  1  0  1  1  0  0  0  0  1
## 22  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 23  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 24  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
## 25  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 26  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
## 27  0  0  0  1  0  0  0  0  0  0  0  0  0  1  0
## 28  1  0  0  1  0  1  0  1  1  1  0  0  0  0  0
## 29  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 30  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 31  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 32  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 33  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 34  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 35  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 36  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## 37  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
## 38  0  0  0  0  1  1  0  1  1  0  1  0  0  0  1
## 39  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 40  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
## 41  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
## 42  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1
## 43  0  0  0  1  1  1  0  0  1  0  0  0  0  1  0

adjacency_matrix <- graph.adjacency(leader_matrix,
                                    diag = FALSE)

class(adjacency_matrix)

## [1] "igraph"

adjacency_matrix

## IGRAPH 17fb366 DN-- 43 143 -- 
## + attr: name (v/c)
## + edges from 17fb366 (vertex names):
##  [1] 3 ->1  3 ->27 3 ->28 3 ->36 3 ->38 4 ->16 4 ->28 4 ->34 4 ->36 5 ->9 
## [11] 7 ->10 7 ->20 8 ->3  8 ->10 8 ->25 8 ->27 8 ->28 8 ->37 8 ->38 9 ->5 
## [21] 9 ->14 10->3  10->7  10->8  10->11 10->18 10->20 10->38 11->10 11->21
## [31] 11->23 11->35 13->8  13->35 13->42 14->37 15->21 15->39 15->42 15->43
## [41] 16->22 17->26 18->29 18->34 19->31 20->7  20->10 21->1  21->11 21->12
## [51] 21->15 21->33 21->35 21->37 21->38 21->43 22->8  22->16 22->28 23->5 
## [61] 23->11 24->27 24->42 25->3  25->8  26->17 26->41 27->24 27->32 27->42
## [71] 28->1  28->2  28->3  28->4  28->6  28->8  28->9  28->10 28->13 28->18
## + ... omitted several edges

Mutating edge data:

leader_edges <- get.data.frame(adjacency_matrix) |>
  mutate(from = as.character(from)) |>
  mutate(to = as.character(to))


leader_edges

##     from to
## 1      3  1
## 2      3 27
## 3      3 28
## 4      3 36
## 5      3 38
## 6      4 16
## 7      4 28
## 8      4 34
## 9      4 36
## 10     5  9
## 11     7 10
## 12     7 20
## 13     8  3
## 14     8 10
## 15     8 25
## 16     8 27
## 17     8 28
## 18     8 37
## 19     8 38
## 20     9  5
## 21     9 14
## 22    10  3
## 23    10  7
## 24    10  8
## 25    10 11
## 26    10 18
## 27    10 20
## 28    10 38
## 29    11 10
## 30    11 21
## 31    11 23
## 32    11 35
## 33    13  8
## 34    13 35
## 35    13 42
## 36    14 37
## 37    15 21
## 38    15 39
## 39    15 42
## 40    15 43
## 41    16 22
## 42    17 26
## 43    18 29
## 44    18 34
## 45    19 31
## 46    20  7
## 47    20 10
## 48    21  1
## 49    21 11
## 50    21 12
## 51    21 15
## 52    21 33
## 53    21 35
## 54    21 37
## 55    21 38
## 56    21 43
## 57    22  8
## 58    22 16
## 59    22 28
## 60    23  5
## 61    23 11
## 62    24 27
## 63    24 42
## 64    25  3
## 65    25  8
## 66    26 17
## 67    26 41
## 68    27 24
## 69    27 32
## 70    27 42
## 71    28  1
## 72    28  2
## 73    28  3
## 74    28  4
## 75    28  6
## 76    28  8
## 77    28  9
## 78    28 10
## 79    28 13
## 80    28 18
## 81    28 21
## 82    28 22
## 83    28 25
## 84    28 27
## 85    28 29
## 86    28 32
## 87    28 34
## 88    28 36
## 89    28 37
## 90    28 38
## 91    31 19
## 92    32 24
## 93    33 21
## 94    34 18
## 95    34 22
## 96    34 28
## 97    34 29
## 98    35 21
## 99    36  1
## 100   36  3
## 101   36  4
## 102   36  8
## 103   36 16
## 104   36 18
## 105   36 26
## 106   36 27
## 107   36 28
## 108   36 34
## 109   37  3
## 110   37 14
## 111   37 38
## 112   38  1
## 113   38  3
## 114   38  8
## 115   38 11
## 116   38 18
## 117   38 22
## 118   38 26
## 119   38 27
## 120   38 28
## 121   38 33
## 122   38 34
## 123   38 36
## 124   38 37
## 125   38 39
## 126   38 43
## 127   42  9
## 128   42 15
## 129   42 24
## 130   42 27
## 131   42 43
## 132   43  5
## 133   43  8
## 134   43 15
## 135   43 18
## 136   43 21
## 137   43 22
## 138   43 23
## 139   43 32
## 140   43 33
## 141   43 34
## 142   43 37
## 143   43 42

Creating a tibble:

leader_graph <- tbl_graph(edges = leader_edges,
                          nodes = leader_nodes,
                          directed = TRUE)


leader_graph

## # A tbl_graph: 43 nodes and 143 edges
## #
## # A directed simple graph with 4 components
## #
## # Node Data: 43 × 5 (active)
##   id    efficacy trust district_site  male
##   <chr>    <dbl> <dbl>         <dbl> <dbl>
## 1 1         6.06  4                1     0
## 2 2         6.56  5.63             1     0
## 3 3         7.39  4.63             1     0
## 4 4         4.89  4                1     0
## 5 5         6.06  5.75             0     1
## 6 6         7.39  4.38             0     0
## # … with 37 more rows
## #
## # Edge Data: 143 × 2
##    from    to
##   <int> <int>
## 1     3     1
## 2     3    27
## 3     3    28
## # … with 140 more rows

Adding in-degree and out-degree:

leader_measures <- leader_graph |>
  activate(nodes) |>
  mutate(in_degree = centrality_degree(mode = "in")) |>
  mutate(out_degree = centrality_degree(mode = "out"))

leader_measures

## # A tbl_graph: 43 nodes and 143 edges
## #
## # A directed simple graph with 4 components
## #
## # Node Data: 43 × 7 (active)
##   id    efficacy trust district_site  male in_degree out_degree
##   <chr>    <dbl> <dbl>         <dbl> <dbl>     <dbl>      <dbl>
## 1 1         6.06  4                1     0         5          0
## 2 2         6.56  5.63             1     0         1          0
## 3 3         7.39  4.63             1     0         7          5
## 4 4         4.89  4                1     0         2          4
## 5 5         6.06  5.75             0     1         3          1
## 6 6         7.39  4.38             0     0         1          0
## # … with 37 more rows
## #
## # Edge Data: 143 × 2
##    from    to
##   <int> <int>
## 1     3     1
## 2     3    27
## 3     3    28
## # … with 140 more rows

node_measures <- leader_measures |> 
  activate(nodes) |>
  as_tibble()

node_measures

## # A tibble: 43 × 7
##    id    efficacy trust district_site  male in_degree out_degree
##    <chr>    <dbl> <dbl>         <dbl> <dbl>     <dbl>      <dbl>
##  1 1         6.06  4                1     0         5          0
##  2 2         6.56  5.63             1     0         1          0
##  3 3         7.39  4.63             1     0         7          5
##  4 4         4.89  4                1     0         2          4
##  5 5         6.06  5.75             0     1         3          1
##  6 6         7.39  4.38             0     0         1          0
##  7 7         5.56  3.63             0     1         2          2
##  8 8         7.5   5.63             1     1         8          7
##  9 9         7.67  5.25             0     0         3          2
## 10 10        6.64  4.78             0     0         5          7
## # … with 33 more rows

summary(node_measures)

##       id               efficacy         trust       district_site   
##  Length:43          Min.   :4.610   Min.   :3.630   Min.   :0.0000  
##  Class :character   1st Qu.:5.670   1st Qu.:4.130   1st Qu.:0.0000  
##  Mode  :character   Median :6.780   Median :4.780   Median :0.0000  
##                     Mean   :6.649   Mean   :4.783   Mean   :0.4186  
##                     3rd Qu.:7.470   3rd Qu.:5.440   3rd Qu.:1.0000  
##                     Max.   :8.500   Max.   :5.880   Max.   :1.0000  
##       male          in_degree       out_degree    
##  Min.   :0.0000   Min.   :0.000   Min.   : 0.000  
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.: 1.000  
##  Median :0.0000   Median :3.000   Median : 2.000  
##  Mean   :0.4419   Mean   :3.326   Mean   : 3.326  
##  3rd Qu.:1.0000   3rd Qu.:5.000   3rd Qu.: 4.000  
##  Max.   :1.0000   Max.   :8.000   Max.   :20.000

skim(node_measures)

Data summary
Name	node_measures
Number of rows	43
Number of columns	7
_______________________
Column type frequency:
character	1
numeric	6
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
id	0	1	1	2	0	43	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
efficacy	1	6.65	1.10	4.61	5.67	6.78	7.47	8.50	▅▅▃▇▃
trust	1	4.78	0.71	3.63	4.13	4.78	5.44	5.88	▆▆▅▆▇
district_site	1	0.42	0.50	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▆
male	1	0.44	0.50	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▆
in_degree	1	3.33	2.12	0.00	2.00	3.00	5.00	8.00	▅▇▂▅▂
out_degree	1	3.33	4.26	0.00	1.00	2.00	4.00	20.00	▇▁▁▁▁

node_measures |>
  group_by(district_site) |>
  summarise(n = n(),
            mean = mean(in_degree), 
            sd = sd(in_degree)
            )

## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  2.36  1.58
## 2             1    18  4.67  2.09

node_measures |>
  group_by(district_site) |>
  summarize(n = n(), mean = mean(out_degree), sd = sd(out_degree))

## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  2.44  3.01
## 2             1    18  4.56  5.41

node_measures |>
  group_by(district_site) |>
  summarize(n = n(), mean = mean(trust), sd = sd(trust))

## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  4.90 0.719
## 2             1    18  4.62 0.688

node_measures |>
  group_by(district_site) |>
  summarize(n = n(), mean = mean(efficacy), sd = sd(efficacy))

## # A tibble: 2 × 4
##   district_site     n  mean    sd
##           <dbl> <int> <dbl> <dbl>
## 1             0    25  7.02 0.953
## 2             1    18  6.13 1.12

leader_measures |>
ggraph(layout = "kk") + 
  geom_node_point(aes(size=25), colour = "blue") +
  geom_edge_link() + 
  theme_graph()

leader_network <- as.network(leader_edges,
                             vertices = leader_nodes)

leader_network

##  Network attributes:
##   vertices = 43 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 143 
##     missing edges= 0 
##     non-missing edges= 143 
## 
##  Vertex attribute names: 
##     district_site efficacy male trust vertex.names 
## 
## No edge attributes

class(leader_network)

## [1] "network"

summary(leader_network ~ edges + mutual)

##  edges mutual 
##    143     36

set.seed(589)

ergm_mod_1 <-ergm(leader_network ~ edges + mutual)

## Starting maximum pseudolikelihood estimation (MPLE):

## Evaluating the predictor and response matrix.

## Maximizing the pseudolikelihood.

## Finished MPLE.

## Starting Monte Carlo maximum likelihood estimation (MCMLE):

## Iteration 1 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.2915.

## Estimating equations are not within tolerance region.

## Iteration 2 of at most 60:

## Optimizing with step length 1.0000.

## The log-likelihood improved by 0.0136.

## Convergence test p-value: 0.1862. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0049.
## Convergence test p-value: 0.0968. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0079.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC.  To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.

summary(ergm_mod_1)

## Call:
## ergm(formula = leader_network ~ edges + mutual)
## 
## Monte Carlo Maximum Likelihood Results:
## 
##        Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges   -3.1101     0.1283      0  -24.23   <1e-04 ***
## mutual   3.1246     0.3003      0   10.40   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 2503.6  on 1806  degrees of freedom
##  Residual Deviance:  891.8  on 1804  degrees of freedom
##  
## AIC: 895.8  BIC: 906.8  (Smaller is better. MC Std. Err. = 0.7447)

Create a data product.

leader_measures |>
ggraph(layout = "kk") + 
  geom_node_point(aes(size=trust, alpha=district_site), colour = "blue") +
  geom_edge_link() + 
  geom_node_text(aes(label = male), repel=TRUE) +
  theme_graph() +
  facet_wrap(~district_site)

Key Findings

To revisit, the research question guiding this analysis was: Are school leaders more likely to confide in colleagues of their own gender or is district site a better indicator?

In this visualization, I separated the network by district site, labeled actors (male = 1), and set the node size to vary based on trust scores. While there are examples where males have a larger node size, the opposite is also true. However, those working at the district-level do tend to trend higher than those that do not. Based on the visualization, district site is a better predictor for tie formation than gender. However, one thing that is shown is that men that work at the district-level are more likely to form ties with other men working at the district-level than those that do not. This could be valuable to isolate the two groups in order to compare and contrast. One area for further research could be to do an in-depth analysis that separates those working at the district-level and those that are not. While we have looked specifically at males, it could be interesting to see how women at each level compare, as well.

Unit 4 Analysis: Tie Prediction Based on District-Site and Gender

ECI 589 Social Network Analysis and Education

Sam Brooks

April 11, 2022