Unit 2 Independent Analysis: Administrator Centrality

1. Prepare

1a. Review the Research

This analysis is designed to explore a network of 43 education administrators as described in Chapter 7 of our course text, Social network analysis and education: Theory, methods & applications. The District School Leader network data set was generated through the research of Dr. Alan Daly and centers around the impact No Child Left Behind reform efforts had on school and district leadership networks. In this analysis, I’ll be using a simple non-directed and unweighted adjacency matrix to explore network relationships. The specific research question is whether degree of centrality can be used to identify key actors or influencers in the network.

1b. Load Libraries

library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(readxl)
library(graphlayouts)

2. Wrangle

2a. Import Data

I’ll begin by using the read_excel() function to import the ch7_data.xlsx file, add an argument setting the column names to FALSE since our file is a simple matrix with no header or column names, and assign the matrix to a variable named leader_network:

ch7_network <- read_excel("data/ch7_data.xlsx", 
                            col_names = FALSE)

## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...

Let’s quickly inspect the R object we just imported to see what we’ll be working with:

ch7_network

## # A tibble: 43 × 43
##     ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12 ...13
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     0     0     1     1     0     0     0     0     0     0     0     0     0
##  2     0     0     0     0     0     0     0     0     0     0     0     0     0
##  3     1     0     0     0     0     0     0     1     0     1     0     0     1
##  4     1     0     0     0     0     0     0     0     0     0     0     0     0
##  5     0     0     0     0     0     0     0     0     1     0     0     0     0
##  6     0     0     0     0     0     0     0     0     0     0     0     0     0
##  7     0     0     0     0     0     0     0     0     0     1     0     0     0
##  8     0     0     1     0     0     0     0     0     1     1     0     0     1
##  9     0     0     0     0     1     0     0     1     0     0     0     0     0
## 10     0     0     1     0     0     0     1     1     0     0     1     0     0
## # … with 33 more rows, and 30 more variables: ...14 <dbl>, ...15 <dbl>,
## #   ...16 <dbl>, ...17 <dbl>, ...18 <dbl>, ...19 <dbl>, ...20 <dbl>,
## #   ...21 <dbl>, ...22 <dbl>, ...23 <dbl>, ...24 <dbl>, ...25 <dbl>,
## #   ...26 <dbl>, ...27 <dbl>, ...28 <dbl>, ...29 <dbl>, ...30 <dbl>,
## #   ...31 <dbl>, ...32 <dbl>, ...33 <dbl>, ...34 <dbl>, ...35 <dbl>,
## #   ...36 <dbl>, ...37 <dbl>, ...38 <dbl>, ...39 <dbl>, ...40 <dbl>,
## #   ...41 <dbl>, ...42 <dbl>, ...43 <dbl>

We have a 43 x 43 “tibble” or data table representing our collaboration ties. This network data has been transformed so that it’s nondirected and binary; a tie is either present or absent and is nonvalued.

To generate the above relational matrix, each administrator was asked, “How often do you turn to each Administrative Team member for information on work-related topics?” Their answers were measured on a 4-point scale, with a high score of 4 indicating that ego engaged in that relation 1 to 2 times a week. The data have been dichotomized so that only those ties that were originally coded a 3 or 4 are now 1s, indicating that there is a relation between ego and alter. Any tie that was originally absent or coded 1 or 2 is now a 0, indicating that a tie does not exist. Furthermore, the relations were symmetrized using a weak criterion: If one actor reported a tie (1), then the relation is considered present.

2b. Create a Tidy Graph

As noted above, before we can begin exploring our data through network visualization and analysis, we must first restructure our data into a formal matrix and then convert to a network class R object required by the igraph, tidygraph, and ggraph packages.

However, one thing you may have noticed from our inspection of the data above is that our data table is missing the names of the school leaders in our network, which are needed for determining dyads in our network. R has packages for creating random names to help anonymize data, but to keep things simple, we’ll just assign the numbers 1-43 as names for our rows and columns.

rownames(ch7_network) <- 1:43

## Warning: Setting row names on a tibble is deprecated.

colnames(ch7_network) <- 1:43

Inspecting our leader_network data table shows that this worked:

ch7_network

## # A tibble: 43 × 43
##      `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`  `12`  `13`
##  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     0     0     1     1     0     0     0     0     0     0     0     0     0
##  2     0     0     0     0     0     0     0     0     0     0     0     0     0
##  3     1     0     0     0     0     0     0     1     0     1     0     0     1
##  4     1     0     0     0     0     0     0     0     0     0     0     0     0
##  5     0     0     0     0     0     0     0     0     1     0     0     0     0
##  6     0     0     0     0     0     0     0     0     0     0     0     0     0
##  7     0     0     0     0     0     0     0     0     0     1     0     0     0
##  8     0     0     1     0     0     0     0     0     1     1     0     0     1
##  9     0     0     0     0     1     0     0     1     0     0     0     0     0
## 10     0     0     1     0     0     0     1     1     0     0     1     0     0
## # … with 33 more rows, and 30 more variables: 14 <dbl>, 15 <dbl>, 16 <dbl>,
## #   17 <dbl>, 18 <dbl>, 19 <dbl>, 20 <dbl>, 21 <dbl>, 22 <dbl>, 23 <dbl>,
## #   24 <dbl>, 25 <dbl>, 26 <dbl>, 27 <dbl>, 28 <dbl>, 29 <dbl>, 30 <dbl>,
## #   31 <dbl>, 32 <dbl>, 33 <dbl>, 34 <dbl>, 35 <dbl>, 36 <dbl>, 37 <dbl>,
## #   38 <dbl>, 39 <dbl>, 40 <dbl>, 41 <dbl>, 42 <dbl>, 43 <dbl>

The rows and columns now have names corresponding to each school leader in the network.

Convert to Matrix Object

ch7_matrix <- as.matrix(ch7_network)

Convert to Graph Object

The final step before exploring the data is to convert the matrix to a network object recognized by the {igraph} and {tidygraph} packages. The as_tbl_graph() function can easily convert relational data from all common network data formats such as matrices, network, phylo, dendrogram, data.tree, graph, etc.

The following code converts the matrix to directed network graph and saves it as a new object called leader_network:

leader_network <- as_tbl_graph(ch7_matrix, directed = TRUE)

leader_network

## # A tbl_graph: 43 nodes and 272 edges
## #
## # A directed multigraph with 1 component
## #
## # Node Data: 43 × 1 (active)
##   name 
##   <chr>
## 1 1    
## 2 2    
## 3 3    
## 4 4    
## 5 5    
## 6 6    
## # … with 37 more rows
## #
## # Edge Data: 272 × 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     3      1
## 2     1     4      1
## 3     1    18      1
## # … with 269 more rows

Upon inspection, the leader_network provides a range of information about network size, type, number of components, and a preview of the node and edge lists that it created. The node and edge lists are treated just like a typical data frame and can now be used with other tidyverse packages and functions to create new actor-level network variables such as degree and centrality.

3. Explore

3a. Visually Describe the Network

To get a general sense of the relationships between district administrators, the leader-network object is supplied as the first argument to the plot() function:

plot(leader_network)

This result is neither aesthetically pleasing nor functional. There is one large network component and a few periphery nodes that are more loosely connected, but it’s difficult to determine how well the nodes are connected in the midst of the large mass.

Fortunately, the {ggraph} package includes mulitple plotting parameters for graph layouts, edges and nodes to improve the visual design and readability of network graphs.

Sociograms with ggraph

Like the basic plotting function above, ggraph has a similar autograph() function for automatically generating a simple sociogram to get a quick snapshot of the leader_network:

autograph(leader_network)

This produces a very basic plot that includes network nodes and edges with minimal styling.

Within this function, however, are the three core functions noted above that can be modified though different arguments or by adding additional layers of functions to change the visual aesthetics of the leader_network sociogram to highlight key characteristics of the network, like centrality degree or key actors.

The same basic plot above can also be generated by explicitly calling the ggraph, node, and edge functions:

ggraph(leader_network, layout = "fr") +
  geom_node_point(size = 3) +
  geom_edge_link()

That is perhaps slightly better and provides a starting point to highlight key findings specific to network measures that we’ll explore to answer the specific research question focused on centrality.

3b. Describe the Network Mathematically

Centralization

Centralization Degree refers to the number of ties an actor either sends (out-degree), receives (in-degree), or in the case of a non-directed network or both sent and received in a directed network, simply just “degree” for all actors to which one is connected.

centr_degree(leader_network, mode = "all")

## $res
##  [1] 14  2 24 10  8  4  4 30 12 16 12  2 18  6 12 10  4 22  4  4 16 12  2 16 20
## [26] 12 26 42  6  2  2  8  4 18  4 20 22 54  2  6  8 16  8
## 
## $centralization
## [1] 0.5039683
## 
## $theoretical_max
## [1] 3528

The first variable $res provides node-level centrality scores. So for example, the first school leader in our network is connected to 14 other actors in the network. The $centralization variable provides our actual centrality score, which is quite small suggesting a very decentralized network. And the final $theoretical_max provides the maximum theoretical graph level centralization score for our graph based on the number of vertices.

To address the primary objective of identifying key actors, node-level characteristics are required. The {tidygraph} package has an unique function called activate() that allows us to treat the nodes and edges in our network object as if they were a standard data frame that we can then apply standard tidyverse functions to like select(), filter(), and mutate().

The latter function, mutate(), can be use to create new variables for node-level measures of degree using the centrality_degree() function in the {tidygraph} package.

The following code adds each actor’s degree to the district administrator network and assigns the output leader_network again so the results are saved:

leader_network <- leader_network |>
  activate(nodes) |>
  mutate(degree = centrality_degree(mode = "all"))

leader_network

## # A tbl_graph: 43 nodes and 272 edges
## #
## # A directed multigraph with 1 component
## #
## # Node Data: 43 × 2 (active)
##   name  degree
##   <chr>  <dbl>
## 1 1         14
## 2 2          2
## 3 3         24
## 4 4         10
## 5 5          8
## 6 6          4
## # … with 37 more rows
## #
## # Edge Data: 272 × 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     3      1
## 2     1     4      1
## 3     1    18      1
## # … with 269 more rows

Now we could adjust the size of more central actors in our sociogram from above like so to highlight the full range of collaboration ties in our network:

ggraph(leader_network) +
  geom_node_point(aes(size = degree)) +
  geom_edge_link() +
  theme_graph()

## Using `stress` as default layout

Density

Since we know the number of ties, or edges, in the network, the overall density can be examined:

edge_density(leader_network)

## [1] 0.1506091

The administrator collaboration network has a LOW density, which suggests that collaboration in this network is quite limited and may impact the flow of information, resources, and innovations among school leaders.

4. Communicate

4a. Final Visualization

# compute degree as node size
V(leader_network)$size <- degree(leader_network)

ggraph(leader_network,layout = "stress") +
  geom_edge_link0(aes(),edge_colour = "grey66") +
  geom_node_point(aes(size = size),shape = 21) +
  geom_node_text(aes(filter = size >= 26, label = name),
                 family="serif") +
  scale_size(range = c(1,6)) +
  theme_graph() +
  theme(legend.position = "right") +
  labs(size = "# Connections") +
  labs(title = "District Administrator Network", subtitle = 
       "Connections & Key Actors")

4b. Narrative

For this case study, I was curious to determine the network’s centrality and whether key actors could be identified. My expectation was that a more decentralized network would have relatively fewer key influencers than a more centralized relationship structure. Based on the sociograms above, a couple of key findings are apparent:

There are numerous visible nodes with with 3 or fewer connections, reinforcing the low edge node density score of 0.15 and a centralization score of only 0.5.
There are 4 key actors (38, 28, 27, 8) that are connected to at least 26 other nodes.

These insights could be used by district leadership to focus on best practices that encouraged information sharing. They could look at specific nodes to see if there was a tool (technology, staff meeting, forum, etc.) that enabled more effective interaction. Additionally, there may be some barriers to communication that prevented certain communities/districts/schools from collaborating. For those networks where gatekeepers emerged (key actors), there may be opportunities to decentralize locally to ensure a flatter approach to knowledge sharing so that no single individual retains too much power. Looking ahead, this study could be improved for future analysis by correlating nodes to school or district locations to see how communities shared information locally or between districts.

References

Carolan, B. V. (2013). Social network analysis and education: Theory, methods & applications. Sage Publications.