Identify a Data Source

For the Unit 3 Independent Analysis, I selected the open educational dataset prepared by Kellogg and Edelman (2015) which was used in the Unit 3 Case Study.

Formulate a Question

My original visualizations from the Unit 3 Case Study were histograms that sorted the number of replies sent based on the level of experience of the actors (0 to 3, 4 to 5, 6 to 10, 11 to 20, and 20+). From this visual I found a positive correlation between the number of replies sent and level of experience, or, in other words, the more experienced actors sent the most replies. My case study can be accessed here.

One of the things that I noted in my Unit 3 Case Study that could be an extension of my visual was to “conduct a similar analysis based on in-degree to see if there are similar trends for that measure”.

For this analysis, I am interested in answering the questions:

  1. How do in-degree and out-degree compare based on level of experience?
  2. How are these interactions linked to gender?

Analyze the Data

In this section, we’re repeating the steps from the Case Study so that we can prepare and expand on the original histogram from the Communicate section.

*Note: There are definitely some places where things can be eliminated as they don’t pertain to my final visualization. However, RStudio is currently being very temperamental for me (error messages, needing to restart, etc.) so I’m not going to touch them at the risk that it causes a bigger issue! However, if I chose to continue using this data, it opens up more avenues for better exploration to include the following variables.

First, we will install packages and load the libraries.

install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
install.packages("igraph") 
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(tidygraph)
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:igraph':
## 
##     groups
## The following object is masked from 'package:stats':
## 
##     filter
library(ggraph)
library(skimr)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Let’s use the read_csv() function from the {readr} and {janitor} packages introduced in previous units to read in our edge-list and clean up the variables names:

dlt1_ties <- read_csv("data/dlt1-edges.csv", 
                      col_types = cols(Sender = col_character(), 
                                       Receiver = col_character(), 
                                       `Category Text` = col_skip(), 
                                       `Comment ID` = col_character(), 
                                       `Discussion ID` = col_character())) |>
  clean_names()

dlt1_ties
## # A tibble: 2,529 × 9
##    sender receiver timestamp   discussion_title discussion_cate… parent_category
##    <chr>  <chr>    <chr>       <chr>            <chr>            <chr>          
##  1 360    444      4/4/13 16:… Most important … Group N          Units 1-3 Disc…
##  2 356    444      4/4/13 18:… Most important … Group D-L        Units 1-3 Disc…
##  3 356    444      4/4/13 18:… DLT Resources—C… Group D-L        Units 1-3 Disc…
##  4 344    444      4/4/13 18:… Most important … Group O-T        Units 1-3 Disc…
##  5 392    444      4/4/13 19:… Most important … Group U-Z        Units 1-3 Disc…
##  6 219    444      4/4/13 19:… Most important … Group M          Units 1-3 Disc…
##  7 318    444      4/4/13 19:… Most important … Group M          Units 1-3 Disc…
##  8 4      444      4/4/13 19:… Most important … Group N          Units 1-3 Disc…
##  9 355    356      4/4/13 20:… DLT Resources—C… Group D-L        Units 1-3 Disc…
## 10 355    444      4/4/13 20:… Most important … Group D-L        Units 1-3 Disc…
## # … with 2,519 more rows, and 3 more variables: discussion_identifier <chr>,
## #   comment_id <chr>, discussion_id <chr>

Use the code chunk below to import the dlt1-nodes.csv attribute file and be sure to set the following variables as character data types: UID, Facilitator, expert, connect.

dlt1_nodes <- read_csv("data/dlt1-nodes.csv", 
                      col_types = cols(UID = col_character(), 
                                       Facilitator = col_character(), 
                                       expert = col_character(), 
                                       connect = col_character())) |>
  clean_names()

Now use the code chunk below to inspect the data you imported and complete the matching exercise that follows:

dlt1_nodes
## # A tibble: 445 × 13
##    uid   facilitator role1 experience experience2 grades location region country
##    <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
##  1 1     0           libm…          1 6 to 10     secon… VA       South  US     
##  2 2     0           clas…          1 6 to 10     secon… FL       South  US     
##  3 3     0           dist…          2 11 to 20    gener… PA       North… US     
##  4 4     0           clas…          2 11 to 20    middle NC       South  US     
##  5 5     0           othe…          3 20+         gener… AL       South  US     
##  6 6     0           clas…          1 4 to 5      gener… AL       South  US     
##  7 7     0           inst…          2 11 to 20    gener… SD       Midwe… US     
##  8 8     0           spec…          1 6 to 10     secon… BE       Inter… BE     
##  9 9     0           clas…          1 6 to 10     middle NC       South  US     
## 10 10    0           scho…          2 11 to 20    middle NC       South  US     
## # … with 435 more rows, and 4 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>

Run the following code to specify our ties data frame as the edges of our network, our actors data frame for the vertices of our network and their attributes, and indicate that this is indeed a directed network.

dlt1_network <- tbl_graph(edges = dlt1_ties,
                          nodes = dlt1_nodes,
                          node_key = "uid",
                          directed = TRUE)

Take a look at the output for our dlt1_network:

dlt1_network
## # A tbl_graph: 445 nodes and 2529 edges
## #
## # A directed multigraph with 4 components
## #
## # Node Data: 445 × 13 (active)
##   uid   facilitator role1 experience experience2 grades location region country
##   <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
## 1 1     0           libm…          1 6 to 10     secon… VA       South  US     
## 2 2     0           clas…          1 6 to 10     secon… FL       South  US     
## 3 3     0           dist…          2 11 to 20    gener… PA       North… US     
## 4 4     0           clas…          2 11 to 20    middle NC       South  US     
## 5 5     0           othe…          3 20+         gener… AL       South  US     
## 6 6     0           clas…          1 4 to 5      gener… AL       South  US     
## # … with 439 more rows, and 4 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>
## #
## # Edge Data: 2,529 × 9
##    from    to timestamp discussion_title discussion_cate… parent_category
##   <int> <int> <chr>     <chr>            <chr>            <chr>          
## 1   360   444 4/4/13 1… Most important … Group N          Units 1-3 Disc…
## 2   356   444 4/4/13 1… Most important … Group D-L        Units 1-3 Disc…
## 3   356   444 4/4/13 1… DLT Resources—C… Group D-L        Units 1-3 Disc…
## # … with 2,526 more rows, and 3 more variables: discussion_identifier <chr>,
## #   comment_id <chr>, discussion_id <chr>

The {igraph} package has a simple function for identifying the number of components in a network, the size of each component, and which actors belong to each. Let’s first take a quick look at the summaries for “weak” components in our network:

components(dlt1_network, mode = c("weak"))
## $membership
##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [297] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [334] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [371] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [408] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 1
## [445] 1
## 
## $csize
## [1] 442   1   1   1
## 
## $no
## [1] 4

How do you think you might find the “strong” components in our network using the component() function? Use the following code chunk to test out your theory:

components(dlt1_network, mode = c("strong"))
## $membership
##   [1] 194 194 194 194 194 194 194 194 194 194 194 194 194 194 194 194 194 193
##  [19] 194 194 203 194 209 194 194 194 194 192 194 194 191 194 194 194 194 194
##  [37] 194 194 194 190 194 194 194 194 194 194 189 194 194 194 194 194 194 194
##  [55] 188 194 194 194 194 194 194 194 194 194 194 194 194 194 194 194 194 194
##  [73] 187 194 194 194 194 194 186 194 194 194 194 208 194 185 194 194 184 194
##  [91] 194 194 183 194 181 180 178 194 194 194 194 177 194 194 194 194 194 176
## [109] 194 194 175 194 194 194 194 194 194 194 194 174 194 194 169 168 167 166
## [127] 165 194 194 164 194 194 194 163 162 194 194 194 161 159 194 194 158 194
## [145] 173 182 157 156 155 172 171 194 206 194 194 194 194 194 194 154 194 194
## [163] 194 153 194 152 194 151 150 194 149 194 194 148 202 194 194 194 147 146
## [181] 194 201 194 194 194 145 195 194 198 194 194 194 194 194 194 144 194 194
## [199] 194 194 194 143 194 142 194 194 194 194 194 141 194 194 140 139 138 194
## [217] 194 194 194 137 194 194 194 136 135 194 194 134 133 131 130 129 128 204
## [235] 194 127 126 125 160 124 123 122 194 121 119 194 194 194 194 194 194 194
## [253] 194 194 118 194 194 179 117 115 116 194 120 114 194 194 113 194 112 194
## [271] 194 194 199 111 194 194 194 110 194 109 194 108 106 107 194 200 194 105
## [289] 104 103 102 194 194 101 194 100  99  98  97 194 194 194 194  96 194 197
## [307] 194 207  95 194  94  93  92  91  90  89 194 194 194  88 194 194 194 205
## [325] 194  87  86  85 194 170 194  84  83 196 194 194 194 194 194 194 194 194
## [343] 194 132 194 194 194  82  81 194 194  80  79  78 194 194  77 194  76  75
## [361] 194  74  73  72  71  70  69  68  67  66  65  64  63  62  61  60  59  58
## [379]  57  56  55  54  53  52  51  50  49  48  47  46  45  44  43  42  41  40
## [397]  39  38  37  36  35  34  33  32  31  30  29  28  27  26  25  24 194  23
## [415]  22  21  20  19  18  17  16 194  15  14  13  12  11  10   9   8   7 194
## [433] 194 194   6   5 194 194 194   4   3   2   1 194 194
## 
## $csize
##   [1]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [19]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [37]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [55]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [73]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
##  [91]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [109]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [127]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [145]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [163]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
## [181]   1   1   1   1   1   1   1   1   1   1   1   1   1 237   1   1   1   1
## [199]   1   1   1   1   1   1   1   1   1   1   1
## 
## $no
## [1] 209

Let’s use the activate() function to single out the node list in our network and use the familiar mutate() and group_components() functions to create a new strong_component variable that indicates and saves the strong components to which each node belongs:

dlt1_network <- dlt1_network |>
  activate(nodes) |>
  mutate(strong_component = group_components(type = "strong"))

Let’s take a look at the nodes in our new dlt1_network, which should now contain our new variable. To do so, we’ll use another handy function from the {tidygraph} package, as_tibble(), that will temporarily convert our node list to a standard table that will allow us to view every row in our node list:

as_tibble(dlt1_network)
## # A tibble: 445 × 14
##    uid   facilitator role1 experience experience2 grades location region country
##    <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
##  1 1     0           libm…          1 6 to 10     secon… VA       South  US     
##  2 2     0           clas…          1 6 to 10     secon… FL       South  US     
##  3 3     0           dist…          2 11 to 20    gener… PA       North… US     
##  4 4     0           clas…          2 11 to 20    middle NC       South  US     
##  5 5     0           othe…          3 20+         gener… AL       South  US     
##  6 6     0           clas…          1 4 to 5      gener… AL       South  US     
##  7 7     0           inst…          2 11 to 20    gener… SD       Midwe… US     
##  8 8     0           spec…          1 6 to 10     secon… BE       Inter… BE     
##  9 9     0           clas…          1 6 to 10     middle NC       South  US     
## 10 10    0           scho…          2 11 to 20    middle NC       South  US     
## # … with 435 more rows, and 5 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>, strong_component <int>

We could even extend this if we liked to create some standard table summaries using the summarise() function. Run the code below to create a count of the number of nodes in each strong component:

dlt1_network |>
  as_tibble() |>
  group_by(strong_component) |>
  summarise(count = n()) |>
  arrange(desc(count))
## # A tibble: 209 × 2
##    strong_component count
##               <int> <int>
##  1                1   237
##  2                2     1
##  3                3     1
##  4                4     1
##  5                5     1
##  6                6     1
##  7                7     1
##  8                8     1
##  9                9     1
## 10               10     1
## # … with 199 more rows

Similar to our component analysis, the {igraph} function has a simple clique_num() function for identifying number of completely connected subgroups in a network:

dlt1_network |>
  activate(edges) |>
  mutate( reciprocated = edge_is_mutual()) |> 
  filter(reciprocated == TRUE) |>
  autograph()

Or we could filter out all isolates in our strong component network entirely using the same activate() and filter() functions:

dlt1_network |>
  activate(nodes) |>
  filter(strong_component == 1) |>
  autograph()

Similar to our component analysis, the {igraph} function has a simple clique_num() function for identifying number of completely connected subgroups in a network:

clique_num(dlt1_network)
## Warning in clique_num(dlt1_network): At cliques.c:1125 :directionality of edges
## is ignored for directed graphs
## [1] 8

The {igraph} function also has a simple cliques() function for identifying members who belong to the same group. In addition to specifying the network you want to examine, this function also allows you to set the minimum and maximum number of members to included in a clique.

Let’s see if there are any cliques that contain a minimum of 8 nodes?

cliques(dlt1_network, min = 8, max = NULL)
## Warning in cliques(dlt1_network, min = 8, max = NULL): At
## igraph_cliquer.c:57 :Edge directions are ignored for clique calculations
## [[1]]
## + 8/445 vertices, from c8815a5:
## [1]  11  19  24  30  44  60 444 445

Run the following code to group our nodes based on their edge betweenness and print our updated dlt1_network object and take a quick look. Because this is a bit computationally intensive, it make take a minute or so to run.

dlt1_network <- dlt1_network |>
  morph(to_undirected) |>
  activate(nodes) |>
  mutate(sub_group = group_edge_betweenness()) |>
  unmorph()

dlt1_network |>
  as_tibble()
## # A tibble: 445 × 15
##    uid   facilitator role1 experience experience2 grades location region country
##    <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
##  1 1     0           libm…          1 6 to 10     secon… VA       South  US     
##  2 2     0           clas…          1 6 to 10     secon… FL       South  US     
##  3 3     0           dist…          2 11 to 20    gener… PA       North… US     
##  4 4     0           clas…          2 11 to 20    middle NC       South  US     
##  5 5     0           othe…          3 20+         gener… AL       South  US     
##  6 6     0           clas…          1 4 to 5      gener… AL       South  US     
##  7 7     0           inst…          2 11 to 20    gener… SD       Midwe… US     
##  8 8     0           spec…          1 6 to 10     secon… BE       Inter… BE     
##  9 9     0           clas…          1 6 to 10     middle NC       South  US     
## 10 10    0           scho…          2 11 to 20    middle NC       South  US     
## # … with 435 more rows, and 6 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>, strong_component <int>, sub_group <int>

Finally, let get a count of Run the following code to group our nodes and print our new cccss_network_groups object and take a quick look:

dlt1_network |>
  activate(nodes) |>
  as_tibble() |>
  group_by(sub_group) |>
  summarise(count = n()) |>
  arrange(desc(count))
## # A tibble: 326 × 2
##    sub_group count
##        <int> <int>
##  1         1    72
##  2         2     6
##  3         3     5
##  4         4     4
##  5         5     3
##  6         6     3
##  7         7     3
##  8         8     3
##  9         9     2
## 10        10     2
## # … with 316 more rows

Note, that if we had wanted to permanently convert our network to an undirected network, we could use the to_undirected argument as a stand-alone function.

dlt1_undirected <-  to_undirected(dlt1_network)

dlt1_undirected
## # A tbl_graph: 445 nodes and 2529 edges
## #
## # An undirected multigraph with 4 components
## #
## # Node Data: 445 × 15 (active)
##   uid   facilitator role1 experience experience2 grades location region country
##   <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
## 1 1     0           libm…          1 6 to 10     secon… VA       South  US     
## 2 2     0           clas…          1 6 to 10     secon… FL       South  US     
## 3 3     0           dist…          2 11 to 20    gener… PA       North… US     
## 4 4     0           clas…          2 11 to 20    middle NC       South  US     
## 5 5     0           othe…          3 20+         gener… AL       South  US     
## 6 6     0           clas…          1 4 to 5      gener… AL       South  US     
## # … with 439 more rows, and 6 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>, strong_component <int>, sub_group <int>
## #
## # Edge Data: 2,529 × 9
##    from    to timestamp discussion_title discussion_cate… parent_category
##   <int> <int> <chr>     <chr>            <chr>            <chr>          
## 1   360   444 4/4/13 1… Most important … Group N          Units 1-3 Disc…
## 2   356   444 4/4/13 1… Most important … Group D-L        Units 1-3 Disc…
## 3   356   444 4/4/13 1… DLT Resources—C… Group D-L        Units 1-3 Disc…
## # … with 2,526 more rows, and 3 more variables: discussion_identifier <chr>,
## #   comment_id <chr>, discussion_id <chr>

And since it’s much easier to inspect our data as a tibble than using the graph output, we’ll also convert our node to a table and arrange() in descending order by size to make it easier to see the range in values of our network:

dlt1_network <- dlt1_network |>
  activate(nodes) |>
  mutate(size = local_size())

dlt1_network |> 
  as_tibble() |>
  arrange(desc(size)) |> 
  select(uid, facilitator, size)
## # A tibble: 445 × 3
##    uid   facilitator  size
##    <chr> <chr>       <dbl>
##  1 444   1             295
##  2 445   1             160
##  3 44    0              61
##  4 11    0              51
##  5 7     0              42
##  6 30    0              42
##  7 19    0              39
##  8 60    0              39
##  9 36    0              37
## 10 432   0              36
## # … with 435 more rows

Run the following create two new variables for our nodes: in_dgree and out_degree. We’ll set the mode = argument in centrality_degree() function to "in" and "out" respectively.

dlt1_network <- dlt1_network |>
  activate(nodes) |>
  mutate(in_degree = centrality_degree(mode = "in"),
         out_degree = centrality_degree(mode = "out"))
  
dlt1_network |> 
  as_tibble()
## # A tibble: 445 × 18
##    uid   facilitator role1 experience experience2 grades location region country
##    <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
##  1 1     0           libm…          1 6 to 10     secon… VA       South  US     
##  2 2     0           clas…          1 6 to 10     secon… FL       South  US     
##  3 3     0           dist…          2 11 to 20    gener… PA       North… US     
##  4 4     0           clas…          2 11 to 20    middle NC       South  US     
##  5 5     0           othe…          3 20+         gener… AL       South  US     
##  6 6     0           clas…          1 4 to 5      gener… AL       South  US     
##  7 7     0           inst…          2 11 to 20    gener… SD       Midwe… US     
##  8 8     0           spec…          1 6 to 10     secon… BE       Inter… BE     
##  9 9     0           clas…          1 6 to 10     middle NC       South  US     
## 10 10    0           scho…          2 11 to 20    middle NC       South  US     
## # … with 435 more rows, and 9 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>, strong_component <int>, sub_group <int>,
## #   size <dbl>, in_degree <dbl>, out_degree <dbl>

Create a Data Product

dlt1_network <- dlt1_network |>
  as_tibble()

ggplot(dlt1_network, aes(x=out_degree, y=gender, fill=experience2)) +
  geom_bar(stat='identity') +
  facet_wrap(~experience2) +
  labs(title = "Replies Sent by Experience Level and Gender", x = "Number of Ties Sent", y = "Gender")

ggplot(dlt1_network, aes(x=in_degree, y=gender, fill=experience2)) +
  geom_bar(stat='identity') +
  facet_wrap(~experience2) +
  labs(title = "Replies Received by Experience Level and Gender", x = "Number of Ties Received", y = "Gender")

Using my original code from the Unit 3 Case Study, I made a few tweaks and created an additional graph to compare and contrast in-degree and out-degree. One of the differences to my original histogram for out-degree was breaking up the number of posts by gender. By doing this, we can see that the majority of replies sent were done so by women. As well, women sent more replies than men at every experience level. The same is true for the number of replies received. However, the number of replies received by both men and women is much more equal than those sent.

I ran into some of the same aesthetic “problems” that I did in my previous analysis. One of the things that I tried to research was changing the order of arrangement, with the less experienced actors at the beginning and the more experienced at the end. After experimenting, I still wasn’t able to find a solution that accommodated for this. As well, I tried to exclude the “NULL” category. After looking through the data set, it appears to be only one individual. I found a few examples online that I tried to replicate and they didn’t produce any error messages, but they also didn’t resolve the problem either.

For future analysis, I think it would be interesting to show how cliques/subgroups are broken up visually to see which actors (aside from the facilitators) are most likely to respond to or receive responses from others consistently. It would be also interesting to explore the discussion posts by topic and see how this related to the number of replies.

Knit your data product to a desired output format. This may be a simple HTML, PDF, or MS Word file; or something more complex like HTML5 slides, a Tufte-style handout, dashboard, or website. Then create a new forum post below to share your analysis. Include in the post your published web link (e.g., via RPubs or GitHub) or attached file (e.g. HTML or PDF) and add a short reflection including one thing you learned and one thing you’d like to explore further.