1. Data source description

This case study analyzed the Social Network Analysis and Education Year 1 Collaboration Data set. The data consist of 43 school leaders within one school district. This study looks at the centrality and the reciprocity of the school leader collaboration network in Year 1.

1a. The dataset needed for my case study include:

year_1_collaboration.xlsx

1b. Load Libraries

There are four libraries needed in this case study: tidygraph 📦 igraph 📦 ggraph 📦 readxl 📦

library(tidygraph)
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:stats':
## 
##     filter
library(igraph)
## 
## Attaching package: 'igraph'
## The following object is masked from 'package:tidygraph':
## 
##     groups
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(ggraph)
## Loading required package: ggplot2
library(readxl)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ✓ purrr   0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
## x purrr::compose()       masks igraph::compose()
## x tidyr::crossing()      masks igraph::crossing()
## x dplyr::filter()        masks tidygraph::filter(), stats::filter()
## x dplyr::groups()        masks igraph::groups(), tidygraph::groups()
## x dplyr::lag()           masks stats::lag()
## x purrr::simplify()      masks igraph::simplify()

1c. Import Data

Year 1 Data

year_1_collaboration <- read_excel("data/year_1_collaboration.xlsx", 
                            col_names = FALSE)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...

2. Research Questions

Building on the Unit 2 practice, this case study is guided by two questions: 1. How do the centrality measures for the Year 1 directed network reveal the collaboration pattern formed at the begining of the reform? 2. What are the reciprocated ties in Year 1 directed network? In particular, this case study looked into the centrality and reciprocity concepts more in depth.

3. Data Analysis

3a. Wrangle

#Add row and column names for Year 3
rownames(year_1_collaboration) <- 1:43
## Warning: Setting row names on a tibble is deprecated.
colnames(year_1_collaboration) <- 1:43

#View the Year 1 dataset with added names
year_1_collaboration
## # A tibble: 43 × 43
##      `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`  `12`  `13`
##  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     0     0     3     0     0     0     0     0     0     0     0     0     0
##  2     0     0     0     0     0     0     0     0     0     0     0     0     0
##  3     4     0     0     0     0     0     0     0     0     0     0     0     0
##  4     0     0     0     0     0     0     0     0     0     0     0     0     0
##  5     0     0     0     0     0     0     0     0     3     0     0     0     0
##  6     3     0     0     0     0     0     0     0     0     0     0     0     0
##  7     0     0     0     0     0     0     0     0     0     0     0     0     0
##  8     0     0     0     0     0     0     0     0     0     4     0     0     0
##  9     0     0     0     0     4     0     0     0     0     0     0     0     0
## 10     0     0     0     0     0     0     0     4     0     0     0     0     0
## # … with 33 more rows, and 30 more variables: `14` <dbl>, `15` <dbl>,
## #   `16` <dbl>, `17` <dbl>, `18` <dbl>, `19` <dbl>, `20` <dbl>, `21` <dbl>,
## #   `22` <dbl>, `23` <dbl>, `24` <dbl>, `25` <dbl>, `26` <dbl>, `27` <dbl>,
## #   `28` <dbl>, `29` <dbl>, `30` <dbl>, `31` <dbl>, `32` <dbl>, `33` <dbl>,
## #   `34` <dbl>, `35` <dbl>, `36` <dbl>, `37` <dbl>, `38` <dbl>, `39` <dbl>,
## #   `40` <dbl>, `41` <dbl>, `42` <dbl>, `43` <dbl>

Convert to Matrix and Graph Objects

#Convert to matrix object for Year 3
year_1_matrix <- as.matrix(year_1_collaboration)

#Convert to graph object for Year 3 (directed)
year_1_network_D <- as_tbl_graph(year_1_matrix, directed = TRUE)
#View the Year 3 directed dataset after the conversion
year_1_network_D
## # A tbl_graph: 43 nodes and 82 edges
## #
## # A directed simple graph with 3 components
## #
## # Node Data: 43 × 1 (active)
##   name 
##   <chr>
## 1 1    
## 2 2    
## 3 3    
## 4 4    
## 5 5    
## 6 6    
## # … with 37 more rows
## #
## # Edge Data: 82 × 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     3      3
## 2     3     1      4
## 3     3    24      3
## # … with 79 more rows

3b&c. Explore and Analysis

Centrality

In a directed network, the centrality consists of the in- and out-degree. Therefore, the in-degree centrality for Year 1 is: 0.1

# in-degree 
InD <- centr_degree(year_1_network_D, mode = "in")
InD
## $res
##  [1] 4 0 3 1 2 0 2 6 3 4 1 0 0 1 2 1 1 3 2 1 2 2 2 3 2 3 1 0 6 2 2 1 1 3 2 0 4 0
## [39] 4 1 1 2 1
## 
## $centralization
## [1] 0.09745293
## 
## $theoretical_max
## [1] 1806
# in-degree 
hist(degree(year_1_network_D, mode = "in"), col="light blue", 
     main = "In-degree distribution in Year 1 Network",
     xlab = "Degree", ylab = "Frecuency",
     ylim = c(0,20),
     xlim = c(0,10)
     )

The out-degree centrality for Year 3 is: 0.02

# out-degree 
OuD <- centr_degree(year_1_network_D, mode = "out")
OuD
## $res
##  [1] 1 0 2 2 1 3 2 2 1 3 3 1 0 3 2 2 1 2 2 3 3 2 1 3 2 1 3 3 2 2 2 2 2 3 1 3 0 3
## [39] 2 1 2 2 1
## 
## $centralization
## [1] 0.02602436
## 
## $theoretical_max
## [1] 1806
# out-degree 
hist(degree(year_1_network_D, mode = "out"), col="blue", 
     main = "Out-degree distribution in Year 3",
     xlab = "Degree", ylab = "Frecuency",
     ylim = c(0,20),
     xlim = c(0,10),breaks =4)

Given the fact that the in- and out-degree centrality is very different, I looked at the hub score of the vectors.

# Hub Score
hub.score(year_1_network_D)$vector
##            1            2            3            4            5            6 
## 8.545589e-04 0.000000e+00 1.972920e-02 4.238596e-01 8.957799e-03 1.360977e-02 
##            7            8            9           10           11           12 
## 1.196560e-01 8.120156e-04 1.675361e-03 1.854966e-03 1.795702e-03 5.027704e-05 
##           13           14           15           16           17           18 
## 0.000000e+00 1.063527e-01 3.085444e-04 5.818655e-01 1.491743e-05 6.261540e-01 
##           19           20           21           22           23           24 
## 1.234457e-01 9.490801e-04 1.413754e-02 7.957694e-01 4.910621e-04 6.350592e-03 
##           25           26           27           28           29           30 
## 4.205193e-03 8.545589e-04 9.902890e-03 1.000000e+00 7.841599e-01 1.576128e-03 
##           31           32           33           34           35           36 
## 4.364355e-03 7.685158e-03 5.756383e-17 9.557115e-01 0.000000e+00 6.202752e-01 
##           37           38           39           40           41           42 
## 0.000000e+00 6.896191e-01 2.398443e-02 1.377725e-01 1.727447e-05 1.540214e-03 
##           43 
## 3.048151e-04

Reciprocity

#Add a new variable to examine whether the tie is mutual or not
year_1_network_D <- year_1_network_D %>% 
  activate(edges) %>% 
  mutate(reciprocated = edge_is_mutual())

year_1_network_D
## # A tbl_graph: 43 nodes and 82 edges
## #
## # A directed simple graph with 3 components
## #
## # Edge Data: 82 × 4 (active)
##    from    to weight reciprocated
##   <int> <int>  <dbl> <lgl>       
## 1     1     3      3 TRUE        
## 2     3     1      4 TRUE        
## 3     3    24      3 FALSE       
## 4     4    29      3 FALSE       
## 5     4    41      4 FALSE       
## 6     5     9      3 TRUE        
## # … with 76 more rows
## #
## # Node Data: 43 × 1
##   name 
##   <chr>
## 1 1    
## 2 2    
## 3 3    
## # … with 40 more rows
reciprocity(year_1_network_D)
## [1] 0.1707317

4. Data Visualization

Centrality

set.seed(555)
in_graph <- plot.igraph(year_1_network_D, 
            vertex.size=degree(year_1_network_D, mode="in"), 
            main="In-degree")

in_graph
## NULL
set.seed(555)
out_graph <- plot.igraph(year_1_network_D, 
            vertex.size=degree(year_1_network_D, mode="out"), 
            main="Out-degree")

out_graph
## NULL
set.seed(555)
hub_graph <- plot.igraph(year_1_network_D, 
            vertex.size=hub.score(year_1_network_D)$vector*30,   # Re-scaled 
            main="Hubs")

hub_graph
## NULL

Reciprocity

# Define colors of reciprocated ties
V(year_1_network_D)$color <- "white"

# Graph layout
layout <- layout.fruchterman.reingold(year_1_network_D) 

# igraph plot 
plot(year_1_network_D, layout = layout)

ggraph(year_1_network_D, layout="stress") + 
  geom_edge_link(aes(color = reciprocated), alpha = 0.5, 
                  start_cap = circle(2, 'mm'), end_cap = circle(2, 'mm')) +
  scale_edge_width(range = c(0.5, 2.5)) + 
  geom_node_point(color = V(year_1_network_D)$color, size = 5, alpha = 0.5) +
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void() + 
  theme(legend.position = "none") 

ggraph(year_1_network_D, layout = "linear") + 
  geom_edge_arc(aes(width = weight), alpha = 0.8) + 
  scale_edge_width(range = c(0.2, 2)) +
  geom_node_text(aes(label = name)) +
  labs(edge_width = "Degree") +
  theme_graph()

R_graph <- ggraph(year_1_network_D, layout = "linear") + 
  geom_edge_arc(aes(colour = factor(reciprocated),width = weight), alpha = 0.8) + 
  scale_edge_width(range = c(0.2, 2)) +
  geom_node_text(aes(label = name)) +
  labs(edge_width = "Degree") +
  theme_graph()

5. Communications

In this case study, I used the Year 1 school leader collaboration dataset to understand the network collaboration pattern using the degree centrality and the reciprocity.

As shown in Section 3, the network’s (in-degree) centrality score is about 10%. The in-degree network graph in Section 4 shows that ID 29 received the most collaboration request. The network’s (out-degree) centrality is only 2% and its graph shows that the school leaders who had higher in-degrees did not always have the higher out-degrees. Only a few IDs (e.g., IDs 34, 28, 29) were seen in a clustered location in the hub. score graph in Section 4. This finding indicates that the collaboration network is not very strong which a cluster of school leaders had some collaboration and others might not have similar or stronger connections which could affect the process of sharing resources and information among the school leaders.

Reciprocity refers to the mutuality of the ties within a network. For Year 1, reciprocity for the entire network is about 0.17 which indicates that 17% of the ties are reciprocated. This is not surprising given the context of the data set as the more experience and time that the school leaders were building connections, the more collaboration opportunities were likely to increase over time. The first three graphs show the reciprocated ties. In the graphs, there are 7 reciprocated ties which indicate that the information flow did frequently happen in this network. The last graph in Section 4 shows not only the reciprocated ties but also the strengths of the connections between each tie. In this graph, we can see that reciprocated ties seem to have a higher degree compared with non-reciprocated ties. However, there are also some weaker connections among the reciprocated ties such as the relationship between ID 1 and 3. Given the context of the data, we can learn that the collaboration relationship among school leaders is not strong with a small degree of centrality and a very small amount of reciprocated ties.