MOOC-Eds: Comparing Out-degree Communications by Role

1. PREPARE

1a. Data Sources

The data for this analysis was generated in the MOOC-Ed study we were introduced to during our case study last week:

1b. Identify a Question

This project focuses on research question 1 from the study:

“What are the patterns of peer interaction and the structure of peer networks that emerge over the course of a MOOC-Ed?”

Specifically, my goal is to compare engagement between participants of the Fall iteration as an indicator of network structure and then compare that to the Spring iteration participants.

1c. Load Libraries

library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(skimr)
library(janitor)

2. WRANGLE

2a. Import Data

The Edge-List Format

The first file dlt2-edges.csv is an edge-list that contains information about each tie, or relation between two actors in a network. In this context, a “tie” is a reply by one participant in the discussion forum to the post of another participant – or in some cases to their own post! These ties between a single actor are called “self-loops” and as we’ll see later, {tidygraph} has a special function to remove these self loops from a sociogram, or network visualization.

The values in the first two columns of the edge-list format represent a dyad, or tie between two nodes in a network. In addition to Sender and Reciever dyad pairs, the dataset contains the following edge attributes:

Sender = Unique identifier of author of comment
Receiver = Unique identifier of identified recipient of comment
Timestamp = Time post or reply was posted
Parent = Primary category or topic of thread
Category = Subcategory or subtopic of thread
Thread_id = Unique identifier of a thread
Comment_id = Unique identifier of a comment\

I’m using the read_csv() function from the {readr} and {janitor} packages to read in the edge-list and clean up the variables names:

dlt2_ties <- read_csv("data/dlt2-edges.csv", 
                      col_types = cols(Sender = col_character(), 
                                       Receiver = col_character(), 
                                       `Category Text` = col_skip(), 
                                       `Comment ID` = col_character(), 
                                       `Discussion ID` = col_character())) |>
  clean_names()

## Warning: The following named parsers don't match the column names: Receiver,
## Category Text, Comment ID, Discussion ID

dlt2_ties

## # A tibble: 2,584 × 9
##    sender reciever timestamp title   category   parent description    comment_id
##    <chr>     <dbl> <chr>     <chr>   <chr>      <chr>  <chr>               <dbl>
##  1 37          461 9/27/13   Greeti… Introduct… Group… Please introd…          3
##  2 434         434 9/27/13   Sherry… Introduct… Group… Please introd…          4
##  3 302         434 9/27/13   Sherry… Introduct… Group… Please introd…          5
##  4 60          434 9/27/13   Sherry… Introduct… Group… Please introd…          7
##  5 128         434 9/29/13   Sherry… Introduct… Group… Please introd…         38
##  6 158         434 9/30/13   Sherry… Introduct… Group… Please introd…         75
##  7 442         434 10/1/13   Sherry… Introduct… Group… Please introd…        193
##  8 43          245 9/27/13   Planni… Changes i… Group… What would yo…          6
##  9 426         245 9/30/13   Planni… Changes i… Group… What would yo…        146
## 10 142         245 10/1/13   Planni… Changes i… Group… What would yo…        201
## # … with 2,574 more rows, and 1 more variable: discussion_id <chr>

The col_types = argument changes the column types to character strings since the numbers for those particular columns indicate actors (Sender and Reciever) and attributes (Comment_ID and Discussion_Id).

Node Attributes

The second file contains all the nodes or actors (i.e., participants who posted to the discussion forum) as well as some of their attributes such as gender and years of experience in education.

These attribute variables are typically included in a rectangular array, or dataframe, that mimics the actor-by-attribute that is the dominant convention in social science, i.e. rows represent cases, columns represent variables, and cells consist of values on those variables.

dlt2_nodes <- read_csv("data/dlt2-nodes.csv", 
                      col_types = cols(UID = col_character(), 
                                       Facilitator = col_character(), 
                                       expert = col_character(), 
                                       connect = col_character())) |>
  clean_names()

## Warning: The following named parsers don't match the column names: UID,
## Facilitator

dlt2_nodes

## # A tibble: 492 × 13
##      uid facilitator role  experience2 experience grades location region country
##    <dbl>       <dbl> <chr>       <dbl>      <dbl> <chr>  <chr>    <chr>  <chr>  
##  1     1           0 curr…           2         17 gener… IN       Midwe… US     
##  2     2           0 other           1          3 prima… NC       South  US     
##  3     3           0 inst…           2         20 gener… US       South  US     
##  4     4           0 inst…           2         12 middle TX       South  US     
##  5     5           0 other           1          0 gener… CAN      Inter… CA     
##  6     6           0 inst…           1          7 gener… CAN      Inter… CA     
##  7     7           0 oper…           1          6 gener… RI       North… US     
##  8     8           0 curr…           2         12 middle ME       North… US     
##  9     9           0 other           2         11 gener… VA       South  US     
## 10    10           0 inst…           3         21 gener… WV       South  US     
## # … with 482 more rows, and 4 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>

Below are the node variables:

Facilitator = Identification of course facilitator (1 = instructor)
Connect = variable for whether participants listed networking and collaboration with others as one of their course goals on the registration form
Expert = Identifier of “expert panelists” invited to course to share experience through recorded Q&A
Role1 = Professional role (eg, teacher, librarian, administrator)
Experience = Years of experience as an educator
Grades = Works with elementary, middle, and/or high school students
Group = Initial assignment of discussion group

2b. Create Network Object

Convert to Graph Object

dlt2_network <- tbl_graph(edges = dlt2_ties,
                          nodes = dlt2_nodes,
                          node_key = "uid",
                          directed = TRUE)

dlt2_network

## # A tbl_graph: 492 nodes and 2584 edges
## #
## # A directed multigraph with 69 components
## #
## # Node Data: 492 × 13 (active)
##     uid facilitator role  experience2 experience grades location region country
##   <dbl>       <dbl> <chr>       <dbl>      <dbl> <chr>  <chr>    <chr>  <chr>  
## 1     1           0 curr…           2         17 gener… IN       Midwe… US     
## 2     2           0 other           1          3 prima… NC       South  US     
## 3     3           0 inst…           2         20 gener… US       South  US     
## 4     4           0 inst…           2         12 middle TX       South  US     
## 5     5           0 other           1          0 gener… CAN      Inter… CA     
## 6     6           0 inst…           1          7 gener… CAN      Inter… CA     
## # … with 486 more rows, and 4 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>
## #
## # Edge Data: 2,584 × 9
##    from    to timestamp title category parent description comment_id
##   <int> <int> <chr>     <chr> <chr>    <chr>  <chr>            <dbl>
## 1    37   461 9/27/13   Gree… Introdu… Group… Please int…          3
## 2   434   434 9/27/13   Sher… Introdu… Group… Please int…          4
## 3   302   434 9/27/13   Sher… Introdu… Group… Please int…          5
## # … with 2,581 more rows, and 1 more variable: discussion_id <chr>

The DLT 2 network had slightly more participants than the Spring iteration, and as a result, the network has more nodes (492) and edges (3,028). It also has many more components (69) than did DLT 1.

3. EXPLORE

3a. Analyze Groups: Components

Visualize Components

Recall from our output above that our “multigraph” had 69 components. Plotting the network shows the breakdown of components:

autograph(dlt2_network)

As you can see, 68 of our components are isolates in our network. Directed graphs, such as the DLT 2 network, have two different kinds of components: weak and strong. A weak component, like ours above, ignores the direction of a tie; strong components do not. Rather,

Strong components consist of nodes that are connected to one another via both directions along the path that connects them.

The {igraph} package has a simple function for identifying the number of components in a network, the size of each component, and which actors belong to each. Let’s take a quick look at the “strong” components in our network:

components(dlt2_network, mode = c("strong"))

## $membership
##   [1] 245 244 154 241 153 154 154 216 152 154 154 230 242 154 151 150 154 154
##  [19] 154 154 149 154 154 154 148 154 147 154 154 154 146 154 154 154 243 145
##  [37] 143 154 142 154 154 195 154 154 141 154 140 154 239 169 154 154 154 167
##  [55] 224 139 138 137 154 154 166 154 154 136 135 134 133 154 154 154 219 154
##  [73] 132 154 128 127 154 126 125 208 124 154 154 154 123 122 121 154 120 154
##  [91] 119 194 154 154 154 172 213 154 118 154 227 211 193 117 116 154 154 115
## [109] 171 154 154 154 114 154 113 165 154 214 154 112 110 154 234 154 154 154
## [127] 154 154 154 154 164 154 154 154 154 109 154 108 107 106 105 154 238 104
## [145] 103 102 154 154 154 154 101 154 100  99 154 220 154 154  98 154 154 202
## [163]  97  96  95  93 154  92 154 154 154  91 190 154  90 154 154 154 188 154
## [181] 154  89 154 131 154 154 154 154 154 154  88 154  87  86 154 154 154 154
## [199] 163 154  85 225 154  84 162 187 154 154 199  83 161 154 154 154  82 154
## [217] 154 154  81 240 232  80 154 154  79 154 154  78 154  77 160  76 154  75
## [235] 154 154 159  74 154 186 154  94  73 154 154 154  72  71  70 184 154  69
## [253]  95  68 154 185 154  67  66 154  65  64 154 154 154 154 154 154 229 154
## [271] 154 154 154 154  63  62 154  61 204 154 154  60 154 154 154  59 221 154
## [289]  58  57 183 154 233  56  55 154  54 154  53 228 154 154 154 154  52 217
## [307] 129 201 154 154 158  51  50 182  49 154 154 154 130  48 154 154 181 215
## [325]  47 154  46 154 154  45 154 154 154 180 154 154 154 157  44 154 154 154
## [343] 231 154 154  43 154 154  42 203 154  41 156  40 237 236 154 154 191  39
## [361] 154 154  38  37  36 196 154  35 154 189 154 154 154 154 154 168  34 154
## [379] 154 154 154 154 154 154  33 222 154 154 209  32 192  31  30 154  29 154
## [397] 212 154 200  28 154  27 154  26 154  25 154  24 154 155 154 154 154 111
## [415]  23 198 154  22 154 154 154 154 154 170  21 154 179 154 154 154 154  20
## [433] 154 226 178 154 154  19  18 154 210  17  16  15 154  14 154 177 207  13
## [451] 218  12 154 176 154  11 154 154 154 154 144 197 154  10 223 154   9   8
## [469]   7 205 154   6   5 154 235   4 154 175 154 154 154   3 154 154 154 154
## [487]   2   1 206 174 154 173
## 
## $csize
##   [1]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [19]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [37]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [55]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [73]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [91]   1   1   1   1   2   1   1   1   1   1   1   1   1   1   1   1   1   1
## [109]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [127]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [145]   1   1   1   1   1   1   1   1   1 247   1   1   1   1   1   1   1   1
## [163]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [181]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [199]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [217]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [235]   1   1   1   1   1   1   1   1   1   1   1
## 
## $no
## [1] 245

There are 245 distinct stong components. To save this component description to the network:

dlt2_network <- dlt2_network |>
  activate(nodes) |>
  mutate(strong_component = group_components(type = "strong"))

Reviewing the nodes in the updated dlt2_network show that this new variable has been permanently added:

as_tibble(dlt2_network)

## # A tibble: 492 × 14
##      uid facilitator role  experience2 experience grades location region country
##    <dbl>       <dbl> <chr>       <dbl>      <dbl> <chr>  <chr>    <chr>  <chr>  
##  1     1           0 curr…           2         17 gener… IN       Midwe… US     
##  2     2           0 other           1          3 prima… NC       South  US     
##  3     3           0 inst…           2         20 gener… US       South  US     
##  4     4           0 inst…           2         12 middle TX       South  US     
##  5     5           0 other           1          0 gener… CAN      Inter… CA     
##  6     6           0 inst…           1          7 gener… CAN      Inter… CA     
##  7     7           0 oper…           1          6 gener… RI       North… US     
##  8     8           0 curr…           2         12 middle ME       North… US     
##  9     9           0 other           2         11 gener… VA       South  US     
## 10    10           0 inst…           3         21 gener… WV       South  US     
## # … with 482 more rows, and 5 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>, strong_component <int>

The code below creates a count of the number of nodes in each strong component:

dlt2_network |>
  as_tibble() |>
  group_by(strong_component) |>
  summarise(count = n()) |>
  arrange(desc(count))

## # A tibble: 245 × 2
##    strong_component count
##               <int> <int>
##  1                1   247
##  2                2     2
##  3                3     1
##  4                4     1
##  5                5     1
##  6                6     1
##  7                7     1
##  8                8     1
##  9                9     1
## 10               10     1
## # … with 235 more rows

Similar to our graph of DLT 1, we see this network has a strong component with many members (n=247), and the remaining components are all isolated nodes.

To illustrate this with a sociogram, a new edge variable is created using the same activate() and mutate() functions and filter() the edges so the graph only contains reciprocated ties:

dlt2_network |>
  activate(edges) |>
  mutate( reciprocated = edge_is_mutual()) |> 
  filter(reciprocated == TRUE) |>
  autograph()

To filter out all isolates entirely, use the same activate() and filter() functions:

dlt2_network |>
  activate(nodes) |>
  filter(strong_component == 1) |>
  autograph()

The group_edge_betweenness() function groups densely connected nodes together. The betweenness centrality measures is something we will look at more closely in the next section.

Because this function can only be used with undirected networks, we will need to pipe |> our dlt_network through the following functions in sequence:

morph() with the to_undirected argument will temporarily change our directed network to an undirected network, or “symmetrize” our network as discussed in Carolan @carolan2014;
activate() will select just our nodes list;
mutate() will created a new subgroup variable using the group_edgebetweenness() function;
unmorph() will change our undirected network back to a directed network.

Run the following code to group our nodes based on their edge betweenness and print our updated dlt2_network object and take a quick look. Because this is a bit computationally intensive, it make take a minute or so to run.

dlt2_network <- dlt2_network |>
  morph(to_undirected) |>
  activate(nodes) |>
  mutate(sub_group = group_edge_betweenness()) |>
  unmorph()

dlt2_network |>
  as_tibble()

## # A tibble: 492 × 15
##      uid facilitator role  experience2 experience grades location region country
##    <dbl>       <dbl> <chr>       <dbl>      <dbl> <chr>  <chr>    <chr>  <chr>  
##  1     1           0 curr…           2         17 gener… IN       Midwe… US     
##  2     2           0 other           1          3 prima… NC       South  US     
##  3     3           0 inst…           2         20 gener… US       South  US     
##  4     4           0 inst…           2         12 middle TX       South  US     
##  5     5           0 other           1          0 gener… CAN      Inter… CA     
##  6     6           0 inst…           1          7 gener… CAN      Inter… CA     
##  7     7           0 oper…           1          6 gener… RI       North… US     
##  8     8           0 curr…           2         12 middle ME       North… US     
##  9     9           0 other           2         11 gener… VA       South  US     
## 10    10           0 inst…           3         21 gener… WV       South  US     
## # … with 482 more rows, and 6 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>, strong_component <int>, sub_group <int>

As you scroll through the nodes tibble produced, you should now see at the far end a new subgroup variable that includes an ID number indicating to which densely connected cluster each node belongs.

To get a count of the subgroups, the following code groups the nodes and displays a tibble of the dlt2_network object:

dlt2_network |>
  activate(nodes) |>
  as_tibble() |>
  group_by(sub_group) |>
  summarise(count = n()) |>
  arrange(desc(count))

## # A tibble: 406 × 2
##    sub_group count
##        <int> <int>
##  1         1    43
##  2         2     4
##  3         3     4
##  4         4     3
##  5         5     3
##  6         6     3
##  7         7     3
##  8         8     3
##  9         9     2
## 10        10     2
## # … with 396 more rows

This output suggests the network has a typical core-periphery structure. At the core of our network are the 43 actors in subgroup 1 (~11 %), with the remaining actors residing on the periphery. This is consistent with the core and periphery values given in the actual MOOC-Ed study.

3b. Egocentric Analysis: Size & Centrality

Size

Since it’s much easier to inspect our data as a tibble than using the graph output, we’ll also convert our node to a table and arrange() in descending order by size to make it easier to see the range in values of our network:

dlt2_network <- dlt2_network |>
  activate(nodes) |>
  mutate(size = local_size())

dlt2_network |> 
  as_tibble() |>
  arrange(desc(size)) |> 
  select(uid, facilitator, size)

## # A tibble: 492 × 3
##      uid facilitator  size
##    <dbl>       <dbl> <dbl>
##  1   302           1   224
##  2   158           1   165
##  3    23           0    74
##  4   183           0    54
##  5   361           0    47
##  6   292           0    45
##  7   267           0    42
##  8   229           0    39
##  9   388           0    39
## 10    26           0    37
## # … with 482 more rows

Not surprisingly, the egos with the most alters are the course facilitators who played a very active role in this course and therefore have an outsize influence on the structure of this network.

Centrality

As we learned in our previous case study and readings, a key structural property of networks is the concept of centralization. A network that is highly centralized is one in which relations are focused on a small number of actors or even a single actor in a network, whereas ties in a decentralized network are diffuse and spread over a number of actors. As we saw above, the facilitators in our network play an outsize role in the MOOC-Ed discussions.

Degree

The following to creates two new variables for our nodes: in_dgree and out_degree. We’ll set the mode = argument in centrality_degree() function to "in" and "out" respectively:

dlt2_network <- dlt2_network |>
  activate(nodes) |>
  mutate(in_degree = centrality_degree(mode = "in"),
         out_degree = centrality_degree(mode = "out"))
  
dlt2_network |> 
  as_tibble()

## # A tibble: 492 × 18
##      uid facilitator role  experience2 experience grades location region country
##    <dbl>       <dbl> <chr>       <dbl>      <dbl> <chr>  <chr>    <chr>  <chr>  
##  1     1           0 curr…           2         17 gener… IN       Midwe… US     
##  2     2           0 other           1          3 prima… NC       South  US     
##  3     3           0 inst…           2         20 gener… US       South  US     
##  4     4           0 inst…           2         12 middle TX       South  US     
##  5     5           0 other           1          0 gener… CAN      Inter… CA     
##  6     6           0 inst…           1          7 gener… CAN      Inter… CA     
##  7     7           0 oper…           1          6 gener… RI       North… US     
##  8     8           0 curr…           2         12 middle ME       North… US     
##  9     9           0 other           2         11 gener… VA       South  US     
## 10    10           0 inst…           3         21 gener… WV       South  US     
## # … with 482 more rows, and 9 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>, strong_component <int>, sub_group <int>,
## #   size <dbl>, in_degree <dbl>, out_degree <dbl>

Let’s take a look at the distribution of out_degree, or the number of replies to other posts, by using in geom_histogram() function for creating histograms:

dlt2_network |> 
  as_tibble() |>
  ggplot() +
  geom_histogram(aes(x = out_degree))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Similar to our DLT 1 graph, most egos in the network sent very few replies (<10) to alters in the course, while a handful of actors in this network have sent 50 or more replies.

Compositional and Variance Measures

To quickly calculate summary statistics for our nodes, including compositional and variance measures for our egocentric measures, we can use the skim() function from the {skimr} package to take a quick look at the variables in our node list:

dlt2_network |> 
  as_tibble() |>
  skim()

Data summary
Name	as_tibble(dlt2_network)
Number of rows	492
Number of columns	18
_______________________
Column type frequency:
character	9
numeric	9
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
role	0	1	5	18	13
grades	0	1	4	10	5
location	1	1	2	4	62
region	0	1	4	13	6
country	2	1	2	3	19
group	0	1	2	4	4
gender	0	1	4	6	2
expert	0	1	1	1	2
connect	0	1	1	1	2

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
uid	1	246.50	142.17	1	123.75	246.5	369.25	492	▇▇▇▇▇
facilitator	1	0.00	0.06	0	0.00	0.0	0.00	1	▇▁▁▁▁
experience2	1	1.93	0.78	1	1.00	2.0	3.00	3	▇▁▇▁▆
experience	1	15.73	11.05	0	8.00	15.0	22.00	120	▇▂▁▁▁
strong_component	1	61.75	78.89	1	1.00	1.0	122.25	245	▇▁▁▁▁
sub_group	1	169.45	129.86	1	37.75	160.5	283.25	406	▇▃▃▃▃
size	1	8.22	15.03	1	2.00	4.0	9.00	224	▇▁▁▁▁
in_degree	1	5.25	20.98	0	0.00	1.0	5.00	430	▇▁▁▁▁
out_degree	1	5.25	11.99	0	0.00	1.0	6.00	166	▇▁▁▁▁

We can see, for example, that egos in our network are connected with on average 8.22 alters with a standards deviation of 15.

Finally, the {skimr} package like our {tidgraph} package plays nicely with other {tidyverse} packages for data wrangling and analysis. For example, let’s select only MOOC-Ed participants who are located in the United States and calculate compositional and variance measures for size by educator’s role:

dlt2_network |> 
  as_tibble() |>
  filter(country == "US") |>
  group_by(role) |>
  select(size) |>
  skim()

## Adding missing grouping variables: `role`

Data summary
Name	select(…)
Number of rows	410
Number of columns	2
_______________________
Column type frequency:
numeric	1
________________________
Group variables	role

Variable type: numeric

skim_variable	role	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
size	admin	1	7.11	7.63	2	3.00	5.5	8.50	35	▇▂▁▁▁
size	classteaching	1	6.82	7.94	1	2.00	4.0	7.00	33	▇▁▁▁▁
size	consultant	1	10.67	13.68	2	3.00	5.0	14.00	45	▇▂▁▁▁
size	curriculum	1	7.37	9.05	1	2.00	3.0	9.00	39	▇▂▁▁▁
size	instructionaltech	1	8.20	9.82	1	2.00	4.0	11.00	54	▇▁▁▁▁
size	libmedia	1	7.89	14.05	1	1.50	4.0	10.00	74	▇▁▁▁▁
size	operations	1	8.50	6.56	2	4.25	7.5	11.75	17	▇▁▃▁▃
size	other	1	5.93	6.89	1	2.00	3.0	7.00	28	▇▁▁▁▁
size	otheredprof	1	133.33	109.97	11	88.00	165.0	194.50	224	▇▁▁▇▇
size	profdev	1	5.75	5.60	1	2.00	4.0	7.25	25	▇▃▁▁▁
size	projectmanagement	1	3.50	3.00	2	2.00	2.0	3.50	8	▇▁▁▁▂
size	specialed	1	4.22	3.53	2	2.00	3.0	4.00	13	▇▁▁▁▁
size	techinfrastructure	1	6.05	5.74	1	2.00	4.0	9.00	24	▇▃▂▁▁

As illustrated by the output above, we can see that “otheredprof” (i.e., other educational professionals including our facilitators) were connected with the most individuals on average (133), while educators in “projectmanagement” connected with the fewest individuals on average (3.5).

4. COMMUNICATE

To address Question 1, actors in the network were categorized into distinct mutually exclusive groups using the core-periphery and regular equivalence functions of UCINET. The former used the CORR algorithm to divide the network into actors that are part of a densely connected subgroup, or “core”, from those that are part of the sparsely connected periphery. Regular equivalence employs the REGE blockmodeling algorithm to partition, or group, actors in the network based on the similarity of their ties to others with similar ties. In essence, blockmodeling provides a systematic way for categorizing educators based on the ways in which they interacted with peers.

As we saw upon just a basic visual inspection of our network during the Explore section, there was a small core of highly connected participants surrounded by those on the “periphery,” or edge, of the network with very few connections. In the DLT 2 course, those on the periphery made up roughly 89% of network. Also consistent with the MOOC-Ed study, those Fall participants maintained an outdegree average of 5.25, an indegree range of 0-41, and an outdegree range of 0-66. Below is a comparison of the Spring (DLT 1) and Fall (DLT 2) iterations of the data.

Data Visualization

network1 <- as.data.frame(dlt1_network)

network2 <- as.data.frame(dlt2_network)

network1 <- rename(network1, role = role1)

network1$role[network1$role=="districtadmin"]<-"admin"
network1$role[network1$role=="schooladmin"]<-"admin"

network1_mod <- network1 %>% 
  mutate(year = "DLT1") %>% 
  relocate(year)

network2_mod <- network2 %>% 
  mutate(year = "DLT2") %>% 
  relocate(year)

network_comb <- rbind(network1_mod, network2_mod)

network_comb |>
  filter(role != "NULL") |>
  filter(role != "other") |>
  ggplot(aes(x = out_degree, fill = role)) +
  geom_bar(aes(y = (..count..)/sum(..count..) * 100)) +
  facet_wrap(~ year) +
  coord_cartesian(xlim = c(0,50)) +
  labs(title = "Comparing Out-degree Centrality by Role", 
       x = "# Replies Sent to Alters", y = "% by Egos (Nodes)") +
  scale_fill_discrete(name = "Professional Role") +
  scale_x_binned(breaks = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50))

Narrative

One of the driving research questions for this case study was to better understand the patterns of peer interaction and structure of peer networks that emerge over the course of this MOOC-Ed. A key challenge to any successful MOOC educational experience is the quality and frequency of communication between the participants. Out-degree centrality is a metric that gives some insight into how well specific nodes, in this case education professionals, respond to people in their subgroups or cliques. The plot above describes the difference between how two different cohorts responded to their network by professional role.

While the Fall cohort was comprised of more participants, the trends remained consistent with their Spring counterparts. Most replied fewer than 5 times to their alters, though the DLT 2 group had a few more participants who responded more than 25 times than DLT 1. The only difference I could spot in group composition was that DLT 2 had more teachers and curriculum developers on average than DLT 1. The DLT 1 group made up for that shortfall in administrators. That could be a line of inquiry to pursue to see if teachers are more likely to communicate reciprocally with their peers than school or district administrators.

MOOC-Eds: Comparing Out-degree Communications by Role

ECI 589 Social Network Analysis and Education

James Hardaway

February 27, 2022

1. PREPARE

1a. Data Sources

1b. Identify a Question

1c. Load Libraries

2. WRANGLE

2a. Import Data

The Edge-List Format

Node Attributes

2b. Create Network Object

Convert to Graph Object

3. EXPLORE

3a. Analyze Groups: Components

Visualize Components

3b. Egocentric Analysis: Size & Centrality

Size

Centrality

4. COMMUNICATE

Data Visualization

Narrative

MOOC-Eds: Comparing Out-degree Communications by Role

ECI 589 Social Network Analysis and Education

James Hardaway

February 27, 2022

1. PREPARE

1a. Data Sources

A Social Network Perspective in MOOC-Eds

1b. Identify a Question

1c. Load Libraries

2. WRANGLE

2a. Import Data

The Edge-List Format

Node Attributes

2b. Create Network Object

Convert to Graph Object

3. EXPLORE

3a. Analyze Groups: Components

Visualize Components

3b. Egocentric Analysis: Size & Centrality

Size

Centrality

4. COMMUNICATE

Data Visualization

Narrative