The data for this analysis was generated in the MOOC-Ed study we were introduced to during our case study last week:
This project focuses on research question 1 from the study:
“What are the patterns of peer interaction and the structure of peer networks that emerge over the course of a MOOC-Ed?”
Specifically, my goal is to compare engagement between participants of the Fall iteration as an indicator of network structure and then compare that to the Spring iteration participants.
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(skimr)
library(janitor)
The first file dlt2-edges.csv is an edge-list that contains information about each tie, or relation between two actors in a network. In this context, a “tie” is a reply by one participant in the discussion forum to the post of another participant – or in some cases to their own post! These ties between a single actor are called “self-loops” and as we’ll see later, {tidygraph} has a special function to remove these self loops from a sociogram, or network visualization.
The values in the first two columns of the edge-list format represent a dyad, or tie between two nodes in a network. In addition to Sender and Reciever dyad pairs, the dataset contains the following edge attributes:
Sender = Unique identifier of author of comment
Receiver = Unique identifier of identified recipient of comment
Timestamp = Time post or reply was posted
Parent = Primary category or topic of thread
Category = Subcategory or subtopic of thread
Thread_id = Unique identifier of a thread
Comment_id = Unique identifier of a comment\
I’m using the read_csv() function from the {readr} and {janitor} packages to read in the edge-list and clean up the variables names:
dlt2_ties <- read_csv("data/dlt2-edges.csv",
col_types = cols(Sender = col_character(),
Receiver = col_character(),
`Category Text` = col_skip(),
`Comment ID` = col_character(),
`Discussion ID` = col_character())) |>
clean_names()
## Warning: The following named parsers don't match the column names: Receiver,
## Category Text, Comment ID, Discussion ID
dlt2_ties
## # A tibble: 2,584 × 9
## sender reciever timestamp title category parent description comment_id
## <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 37 461 9/27/13 Greeti… Introduct… Group… Please introd… 3
## 2 434 434 9/27/13 Sherry… Introduct… Group… Please introd… 4
## 3 302 434 9/27/13 Sherry… Introduct… Group… Please introd… 5
## 4 60 434 9/27/13 Sherry… Introduct… Group… Please introd… 7
## 5 128 434 9/29/13 Sherry… Introduct… Group… Please introd… 38
## 6 158 434 9/30/13 Sherry… Introduct… Group… Please introd… 75
## 7 442 434 10/1/13 Sherry… Introduct… Group… Please introd… 193
## 8 43 245 9/27/13 Planni… Changes i… Group… What would yo… 6
## 9 426 245 9/30/13 Planni… Changes i… Group… What would yo… 146
## 10 142 245 10/1/13 Planni… Changes i… Group… What would yo… 201
## # … with 2,574 more rows, and 1 more variable: discussion_id <chr>
The col_types = argument changes the column types to character strings since the numbers for those particular columns indicate actors (Sender and Reciever) and attributes (Comment_ID and Discussion_Id).
The second file contains all the nodes or actors (i.e., participants who posted to the discussion forum) as well as some of their attributes such as gender and years of experience in education.
These attribute variables are typically included in a rectangular array, or dataframe, that mimics the actor-by-attribute that is the dominant convention in social science, i.e. rows represent cases, columns represent variables, and cells consist of values on those variables.
dlt2_nodes <- read_csv("data/dlt2-nodes.csv",
col_types = cols(UID = col_character(),
Facilitator = col_character(),
expert = col_character(),
connect = col_character())) |>
clean_names()
## Warning: The following named parsers don't match the column names: UID,
## Facilitator
dlt2_nodes
## # A tibble: 492 × 13
## uid facilitator role experience2 experience grades location region country
## <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 1 0 curr… 2 17 gener… IN Midwe… US
## 2 2 0 other 1 3 prima… NC South US
## 3 3 0 inst… 2 20 gener… US South US
## 4 4 0 inst… 2 12 middle TX South US
## 5 5 0 other 1 0 gener… CAN Inter… CA
## 6 6 0 inst… 1 7 gener… CAN Inter… CA
## 7 7 0 oper… 1 6 gener… RI North… US
## 8 8 0 curr… 2 12 middle ME North… US
## 9 9 0 other 2 11 gener… VA South US
## 10 10 0 inst… 3 21 gener… WV South US
## # … with 482 more rows, and 4 more variables: group <chr>, gender <chr>,
## # expert <chr>, connect <chr>
Below are the node variables:
Facilitator = Identification of course facilitator (1 = instructor)Connect = variable for whether participants listed networking and collaboration with others as one of their course goals on the registration formExpert = Identifier of “expert panelists” invited to course to share experience through recorded Q&ARole1 = Professional role (eg, teacher, librarian, administrator)Experience = Years of experience as an educatorGrades = Works with elementary, middle, and/or high school studentsGroup = Initial assignment of discussion groupdlt2_network <- tbl_graph(edges = dlt2_ties,
nodes = dlt2_nodes,
node_key = "uid",
directed = TRUE)
dlt2_network
## # A tbl_graph: 492 nodes and 2584 edges
## #
## # A directed multigraph with 69 components
## #
## # Node Data: 492 × 13 (active)
## uid facilitator role experience2 experience grades location region country
## <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 1 0 curr… 2 17 gener… IN Midwe… US
## 2 2 0 other 1 3 prima… NC South US
## 3 3 0 inst… 2 20 gener… US South US
## 4 4 0 inst… 2 12 middle TX South US
## 5 5 0 other 1 0 gener… CAN Inter… CA
## 6 6 0 inst… 1 7 gener… CAN Inter… CA
## # … with 486 more rows, and 4 more variables: group <chr>, gender <chr>,
## # expert <chr>, connect <chr>
## #
## # Edge Data: 2,584 × 9
## from to timestamp title category parent description comment_id
## <int> <int> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 37 461 9/27/13 Gree… Introdu… Group… Please int… 3
## 2 434 434 9/27/13 Sher… Introdu… Group… Please int… 4
## 3 302 434 9/27/13 Sher… Introdu… Group… Please int… 5
## # … with 2,581 more rows, and 1 more variable: discussion_id <chr>
The DLT 2 network had slightly more participants than the Spring iteration, and as a result, the network has more nodes (492) and edges (3,028). It also has many more components (69) than did DLT 1.
Recall from our output above that our “multigraph” had 69 components. Plotting the network shows the breakdown of components:
autograph(dlt2_network)
As you can see, 68 of our components are isolates in our network. Directed graphs, such as the DLT 2 network, have two different kinds of components: weak and strong. A weak component, like ours above, ignores the direction of a tie; strong components do not. Rather,
Strong components consist of nodes that are connected to one another via both directions along the path that connects them.
The {igraph} package has a simple function for identifying the number of components in a network, the size of each component, and which actors belong to each. Let’s take a quick look at the “strong” components in our network:
components(dlt2_network, mode = c("strong"))
## $membership
## [1] 245 244 154 241 153 154 154 216 152 154 154 230 242 154 151 150 154 154
## [19] 154 154 149 154 154 154 148 154 147 154 154 154 146 154 154 154 243 145
## [37] 143 154 142 154 154 195 154 154 141 154 140 154 239 169 154 154 154 167
## [55] 224 139 138 137 154 154 166 154 154 136 135 134 133 154 154 154 219 154
## [73] 132 154 128 127 154 126 125 208 124 154 154 154 123 122 121 154 120 154
## [91] 119 194 154 154 154 172 213 154 118 154 227 211 193 117 116 154 154 115
## [109] 171 154 154 154 114 154 113 165 154 214 154 112 110 154 234 154 154 154
## [127] 154 154 154 154 164 154 154 154 154 109 154 108 107 106 105 154 238 104
## [145] 103 102 154 154 154 154 101 154 100 99 154 220 154 154 98 154 154 202
## [163] 97 96 95 93 154 92 154 154 154 91 190 154 90 154 154 154 188 154
## [181] 154 89 154 131 154 154 154 154 154 154 88 154 87 86 154 154 154 154
## [199] 163 154 85 225 154 84 162 187 154 154 199 83 161 154 154 154 82 154
## [217] 154 154 81 240 232 80 154 154 79 154 154 78 154 77 160 76 154 75
## [235] 154 154 159 74 154 186 154 94 73 154 154 154 72 71 70 184 154 69
## [253] 95 68 154 185 154 67 66 154 65 64 154 154 154 154 154 154 229 154
## [271] 154 154 154 154 63 62 154 61 204 154 154 60 154 154 154 59 221 154
## [289] 58 57 183 154 233 56 55 154 54 154 53 228 154 154 154 154 52 217
## [307] 129 201 154 154 158 51 50 182 49 154 154 154 130 48 154 154 181 215
## [325] 47 154 46 154 154 45 154 154 154 180 154 154 154 157 44 154 154 154
## [343] 231 154 154 43 154 154 42 203 154 41 156 40 237 236 154 154 191 39
## [361] 154 154 38 37 36 196 154 35 154 189 154 154 154 154 154 168 34 154
## [379] 154 154 154 154 154 154 33 222 154 154 209 32 192 31 30 154 29 154
## [397] 212 154 200 28 154 27 154 26 154 25 154 24 154 155 154 154 154 111
## [415] 23 198 154 22 154 154 154 154 154 170 21 154 179 154 154 154 154 20
## [433] 154 226 178 154 154 19 18 154 210 17 16 15 154 14 154 177 207 13
## [451] 218 12 154 176 154 11 154 154 154 154 144 197 154 10 223 154 9 8
## [469] 7 205 154 6 5 154 235 4 154 175 154 154 154 3 154 154 154 154
## [487] 2 1 206 174 154 173
##
## $csize
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [19] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [37] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [55] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [73] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [91] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
## [109] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [127] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [145] 1 1 1 1 1 1 1 1 1 247 1 1 1 1 1 1 1 1
## [163] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [181] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [199] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [217] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [235] 1 1 1 1 1 1 1 1 1 1 1
##
## $no
## [1] 245
There are 245 distinct stong components. To save this component description to the network:
dlt2_network <- dlt2_network |>
activate(nodes) |>
mutate(strong_component = group_components(type = "strong"))
Reviewing the nodes in the updated dlt2_network show that this new variable has been permanently added:
as_tibble(dlt2_network)
## # A tibble: 492 × 14
## uid facilitator role experience2 experience grades location region country
## <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 1 0 curr… 2 17 gener… IN Midwe… US
## 2 2 0 other 1 3 prima… NC South US
## 3 3 0 inst… 2 20 gener… US South US
## 4 4 0 inst… 2 12 middle TX South US
## 5 5 0 other 1 0 gener… CAN Inter… CA
## 6 6 0 inst… 1 7 gener… CAN Inter… CA
## 7 7 0 oper… 1 6 gener… RI North… US
## 8 8 0 curr… 2 12 middle ME North… US
## 9 9 0 other 2 11 gener… VA South US
## 10 10 0 inst… 3 21 gener… WV South US
## # … with 482 more rows, and 5 more variables: group <chr>, gender <chr>,
## # expert <chr>, connect <chr>, strong_component <int>
The code below creates a count of the number of nodes in each strong component:
dlt2_network |>
as_tibble() |>
group_by(strong_component) |>
summarise(count = n()) |>
arrange(desc(count))
## # A tibble: 245 × 2
## strong_component count
## <int> <int>
## 1 1 247
## 2 2 2
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 1
## 7 7 1
## 8 8 1
## 9 9 1
## 10 10 1
## # … with 235 more rows
Similar to our graph of DLT 1, we see this network has a strong component with many members (n=247), and the remaining components are all isolated nodes.
To illustrate this with a sociogram, a new edge variable is created using the same activate() and mutate() functions and filter() the edges so the graph only contains reciprocated ties:
dlt2_network |>
activate(edges) |>
mutate( reciprocated = edge_is_mutual()) |>
filter(reciprocated == TRUE) |>
autograph()
To filter out all isolates entirely, use the same activate() and filter() functions:
dlt2_network |>
activate(nodes) |>
filter(strong_component == 1) |>
autograph()
The group_edge_betweenness() function groups densely connected nodes together. The betweenness centrality measures is something we will look at more closely in the next section.
Because this function can only be used with undirected networks, we will need to pipe |> our dlt_network through the following functions in sequence:
morph() with the to_undirected argument will temporarily change our directed network to an undirected network, or “symmetrize” our network as discussed in Carolan @carolan2014;activate() will select just our nodes list;mutate() will created a new subgroup variable using the group_edgebetweenness() function;unmorph() will change our undirected network back to a directed network.Run the following code to group our nodes based on their edge betweenness and print our updated dlt2_network object and take a quick look. Because this is a bit computationally intensive, it make take a minute or so to run.
dlt2_network <- dlt2_network |>
morph(to_undirected) |>
activate(nodes) |>
mutate(sub_group = group_edge_betweenness()) |>
unmorph()
dlt2_network |>
as_tibble()
## # A tibble: 492 × 15
## uid facilitator role experience2 experience grades location region country
## <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 1 0 curr… 2 17 gener… IN Midwe… US
## 2 2 0 other 1 3 prima… NC South US
## 3 3 0 inst… 2 20 gener… US South US
## 4 4 0 inst… 2 12 middle TX South US
## 5 5 0 other 1 0 gener… CAN Inter… CA
## 6 6 0 inst… 1 7 gener… CAN Inter… CA
## 7 7 0 oper… 1 6 gener… RI North… US
## 8 8 0 curr… 2 12 middle ME North… US
## 9 9 0 other 2 11 gener… VA South US
## 10 10 0 inst… 3 21 gener… WV South US
## # … with 482 more rows, and 6 more variables: group <chr>, gender <chr>,
## # expert <chr>, connect <chr>, strong_component <int>, sub_group <int>
As you scroll through the nodes tibble produced, you should now see at the far end a new subgroup variable that includes an ID number indicating to which densely connected cluster each node belongs.
To get a count of the subgroups, the following code groups the nodes and displays a tibble of the dlt2_network object:
dlt2_network |>
activate(nodes) |>
as_tibble() |>
group_by(sub_group) |>
summarise(count = n()) |>
arrange(desc(count))
## # A tibble: 406 × 2
## sub_group count
## <int> <int>
## 1 1 43
## 2 2 4
## 3 3 4
## 4 4 3
## 5 5 3
## 6 6 3
## 7 7 3
## 8 8 3
## 9 9 2
## 10 10 2
## # … with 396 more rows
This output suggests the network has a typical core-periphery structure. At the core of our network are the 43 actors in subgroup 1 (~11 %), with the remaining actors residing on the periphery. This is consistent with the core and periphery values given in the actual MOOC-Ed study.
Since it’s much easier to inspect our data as a tibble than using the graph output, we’ll also convert our node to a table and arrange() in descending order by size to make it easier to see the range in values of our network:
dlt2_network <- dlt2_network |>
activate(nodes) |>
mutate(size = local_size())
dlt2_network |>
as_tibble() |>
arrange(desc(size)) |>
select(uid, facilitator, size)
## # A tibble: 492 × 3
## uid facilitator size
## <dbl> <dbl> <dbl>
## 1 302 1 224
## 2 158 1 165
## 3 23 0 74
## 4 183 0 54
## 5 361 0 47
## 6 292 0 45
## 7 267 0 42
## 8 229 0 39
## 9 388 0 39
## 10 26 0 37
## # … with 482 more rows
Not surprisingly, the egos with the most alters are the course facilitators who played a very active role in this course and therefore have an outsize influence on the structure of this network.
As we learned in our previous case study and readings, a key structural property of networks is the concept of centralization. A network that is highly centralized is one in which relations are focused on a small number of actors or even a single actor in a network, whereas ties in a decentralized network are diffuse and spread over a number of actors. As we saw above, the facilitators in our network play an outsize role in the MOOC-Ed discussions.
Degree
The following to creates two new variables for our nodes: in_dgree and out_degree. We’ll set the mode = argument in centrality_degree() function to "in" and "out" respectively:
dlt2_network <- dlt2_network |>
activate(nodes) |>
mutate(in_degree = centrality_degree(mode = "in"),
out_degree = centrality_degree(mode = "out"))
dlt2_network |>
as_tibble()
## # A tibble: 492 × 18
## uid facilitator role experience2 experience grades location region country
## <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 1 0 curr… 2 17 gener… IN Midwe… US
## 2 2 0 other 1 3 prima… NC South US
## 3 3 0 inst… 2 20 gener… US South US
## 4 4 0 inst… 2 12 middle TX South US
## 5 5 0 other 1 0 gener… CAN Inter… CA
## 6 6 0 inst… 1 7 gener… CAN Inter… CA
## 7 7 0 oper… 1 6 gener… RI North… US
## 8 8 0 curr… 2 12 middle ME North… US
## 9 9 0 other 2 11 gener… VA South US
## 10 10 0 inst… 3 21 gener… WV South US
## # … with 482 more rows, and 9 more variables: group <chr>, gender <chr>,
## # expert <chr>, connect <chr>, strong_component <int>, sub_group <int>,
## # size <dbl>, in_degree <dbl>, out_degree <dbl>
Let’s take a look at the distribution of out_degree, or the number of replies to other posts, by using in geom_histogram() function for creating histograms:
dlt2_network |>
as_tibble() |>
ggplot() +
geom_histogram(aes(x = out_degree))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Similar to our DLT 1 graph, most egos in the network sent very few replies (<10) to alters in the course, while a handful of actors in this network have sent 50 or more replies.
Compositional and Variance Measures
To quickly calculate summary statistics for our nodes, including compositional and variance measures for our egocentric measures, we can use the skim() function from the {skimr} package to take a quick look at the variables in our node list:
dlt2_network |>
as_tibble() |>
skim()
| Name | as_tibble(dlt2_network) |
| Number of rows | 492 |
| Number of columns | 18 |
| _______________________ | |
| Column type frequency: | |
| character | 9 |
| numeric | 9 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| role | 0 | 1 | 5 | 18 | 0 | 13 | 0 |
| grades | 0 | 1 | 4 | 10 | 0 | 5 | 0 |
| location | 1 | 1 | 2 | 4 | 0 | 62 | 0 |
| region | 0 | 1 | 4 | 13 | 0 | 6 | 0 |
| country | 2 | 1 | 2 | 3 | 0 | 19 | 0 |
| group | 0 | 1 | 2 | 4 | 0 | 4 | 0 |
| gender | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
| expert | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
| connect | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| uid | 0 | 1 | 246.50 | 142.17 | 1 | 123.75 | 246.5 | 369.25 | 492 | ▇▇▇▇▇ |
| facilitator | 0 | 1 | 0.00 | 0.06 | 0 | 0.00 | 0.0 | 0.00 | 1 | ▇▁▁▁▁ |
| experience2 | 0 | 1 | 1.93 | 0.78 | 1 | 1.00 | 2.0 | 3.00 | 3 | ▇▁▇▁▆ |
| experience | 0 | 1 | 15.73 | 11.05 | 0 | 8.00 | 15.0 | 22.00 | 120 | ▇▂▁▁▁ |
| strong_component | 0 | 1 | 61.75 | 78.89 | 1 | 1.00 | 1.0 | 122.25 | 245 | ▇▁▁▁▁ |
| sub_group | 0 | 1 | 169.45 | 129.86 | 1 | 37.75 | 160.5 | 283.25 | 406 | ▇▃▃▃▃ |
| size | 0 | 1 | 8.22 | 15.03 | 1 | 2.00 | 4.0 | 9.00 | 224 | ▇▁▁▁▁ |
| in_degree | 0 | 1 | 5.25 | 20.98 | 0 | 0.00 | 1.0 | 5.00 | 430 | ▇▁▁▁▁ |
| out_degree | 0 | 1 | 5.25 | 11.99 | 0 | 0.00 | 1.0 | 6.00 | 166 | ▇▁▁▁▁ |
We can see, for example, that egos in our network are connected with on average 8.22 alters with a standards deviation of 15.
Finally, the {skimr} package like our {tidgraph} package plays nicely with other {tidyverse} packages for data wrangling and analysis. For example, let’s select only MOOC-Ed participants who are located in the United States and calculate compositional and variance measures for size by educator’s role:
dlt2_network |>
as_tibble() |>
filter(country == "US") |>
group_by(role) |>
select(size) |>
skim()
## Adding missing grouping variables: `role`
| Name | select(…) |
| Number of rows | 410 |
| Number of columns | 2 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | role |
Variable type: numeric
| skim_variable | role | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|---|
| size | admin | 0 | 1 | 7.11 | 7.63 | 2 | 3.00 | 5.5 | 8.50 | 35 | ▇▂▁▁▁ |
| size | classteaching | 0 | 1 | 6.82 | 7.94 | 1 | 2.00 | 4.0 | 7.00 | 33 | ▇▁▁▁▁ |
| size | consultant | 0 | 1 | 10.67 | 13.68 | 2 | 3.00 | 5.0 | 14.00 | 45 | ▇▂▁▁▁ |
| size | curriculum | 0 | 1 | 7.37 | 9.05 | 1 | 2.00 | 3.0 | 9.00 | 39 | ▇▂▁▁▁ |
| size | instructionaltech | 0 | 1 | 8.20 | 9.82 | 1 | 2.00 | 4.0 | 11.00 | 54 | ▇▁▁▁▁ |
| size | libmedia | 0 | 1 | 7.89 | 14.05 | 1 | 1.50 | 4.0 | 10.00 | 74 | ▇▁▁▁▁ |
| size | operations | 0 | 1 | 8.50 | 6.56 | 2 | 4.25 | 7.5 | 11.75 | 17 | ▇▁▃▁▃ |
| size | other | 0 | 1 | 5.93 | 6.89 | 1 | 2.00 | 3.0 | 7.00 | 28 | ▇▁▁▁▁ |
| size | otheredprof | 0 | 1 | 133.33 | 109.97 | 11 | 88.00 | 165.0 | 194.50 | 224 | ▇▁▁▇▇ |
| size | profdev | 0 | 1 | 5.75 | 5.60 | 1 | 2.00 | 4.0 | 7.25 | 25 | ▇▃▁▁▁ |
| size | projectmanagement | 0 | 1 | 3.50 | 3.00 | 2 | 2.00 | 2.0 | 3.50 | 8 | ▇▁▁▁▂ |
| size | specialed | 0 | 1 | 4.22 | 3.53 | 2 | 2.00 | 3.0 | 4.00 | 13 | ▇▁▁▁▁ |
| size | techinfrastructure | 0 | 1 | 6.05 | 5.74 | 1 | 2.00 | 4.0 | 9.00 | 24 | ▇▃▂▁▁ |
As illustrated by the output above, we can see that “otheredprof” (i.e., other educational professionals including our facilitators) were connected with the most individuals on average (133), while educators in “projectmanagement” connected with the fewest individuals on average (3.5).
To address Question 1, actors in the network were categorized into distinct mutually exclusive groups using the core-periphery and regular equivalence functions of UCINET. The former used the CORR algorithm to divide the network into actors that are part of a densely connected subgroup, or “core”, from those that are part of the sparsely connected periphery. Regular equivalence employs the REGE blockmodeling algorithm to partition, or group, actors in the network based on the similarity of their ties to others with similar ties. In essence, blockmodeling provides a systematic way for categorizing educators based on the ways in which they interacted with peers.
As we saw upon just a basic visual inspection of our network during the Explore section, there was a small core of highly connected participants surrounded by those on the “periphery,” or edge, of the network with very few connections. In the DLT 2 course, those on the periphery made up roughly 89% of network. Also consistent with the MOOC-Ed study, those Fall participants maintained an outdegree average of 5.25, an indegree range of 0-41, and an outdegree range of 0-66. Below is a comparison of the Spring (DLT 1) and Fall (DLT 2) iterations of the data.
network1 <- as.data.frame(dlt1_network)
network2 <- as.data.frame(dlt2_network)
network1 <- rename(network1, role = role1)
network1$role[network1$role=="districtadmin"]<-"admin"
network1$role[network1$role=="schooladmin"]<-"admin"
network1_mod <- network1 %>%
mutate(year = "DLT1") %>%
relocate(year)
network2_mod <- network2 %>%
mutate(year = "DLT2") %>%
relocate(year)
network_comb <- rbind(network1_mod, network2_mod)
network_comb |>
filter(role != "NULL") |>
filter(role != "other") |>
ggplot(aes(x = out_degree, fill = role)) +
geom_bar(aes(y = (..count..)/sum(..count..) * 100)) +
facet_wrap(~ year) +
coord_cartesian(xlim = c(0,50)) +
labs(title = "Comparing Out-degree Centrality by Role",
x = "# Replies Sent to Alters", y = "% by Egos (Nodes)") +
scale_fill_discrete(name = "Professional Role") +
scale_x_binned(breaks = c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50))
One of the driving research questions for this case study was to better understand the patterns of peer interaction and structure of peer networks that emerge over the course of this MOOC-Ed. A key challenge to any successful MOOC educational experience is the quality and frequency of communication between the participants. Out-degree centrality is a metric that gives some insight into how well specific nodes, in this case education professionals, respond to people in their subgroups or cliques. The plot above describes the difference between how two different cohorts responded to their network by professional role.
While the Fall cohort was comprised of more participants, the trends remained consistent with their Spring counterparts. Most replied fewer than 5 times to their alters, though the DLT 2 group had a few more participants who responded more than 25 times than DLT 1. The only difference I could spot in group composition was that DLT 2 had more teachers and curriculum developers on average than DLT 1. The DLT 1 group made up for that shortfall in administrators. That could be a line of inquiry to pursue to see if teachers are more likely to communicate reciprocally with their peers than school or district administrators.