Introduction

This final project analysis is based on data collected from 2 cohorts of participants who enrolled the 6-week course “Planning for the Digital Learning Transition in K-12 Schools” in 2013. This course was designed and implemented by the Friday Institute at the North Carolina State University and aimed to help the design and implementation process of the K-12 digital learning initiates (Kellogg, Booth & Oliver, 2014).

  1. Data source. The data sources include 1) the csv file with all participants’ attributes, such as ID, role, experience and location; 2) the csv file with all participants’ discussion data within the course online forum for both cohorts. The first cohort is labeled as “DLT_1” and the second cohort is labeled as “DLT_2”.

  2. Guided questions. The guided questions for this final project are from my previous case study analysis on the DLT_2 data set. In that case study, I was interested in seeing how participants from the instructional technology background engaged in the discussion as they may involve in the planning and implementation process of digital learning initiates. The preliminary results of that case stud0y showed that the “instructionaltech” group had a higher betweenness which means that their options/posts might lead to more discussions among participants. In this final project, I aimed to examine how the “instructionaltech” and “gender” factors affect the discussions among participants in both cohorts. This final study was guided by two questions:

    1. in what ways did participants of the DLT 1 and DLT 2 networks form interactions? And
    2. what are the participants’ attributes that significantly predict their interactions in the DLT1 and 2 courses?
  3. Data analysis. This section contains five sessions starting from data wrangling to modeling. Each session provides a description and the matched code blocks to present the details of each step of the analysis.

    1. Load libraries: these packages are needed for this analysis.
    library(tidyverse)
    ## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
    ## ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
    ## ✔ tibble  3.1.6     ✔ dplyr   1.0.9
    ## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
    ## ✔ readr   2.1.2     ✔ forcats 0.5.1
    ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
    ## ✖ dplyr::filter() masks stats::filter()
    ## ✖ dplyr::lag()    masks stats::lag()
    library(igraph)
    ## 
    ## Attaching package: 'igraph'
    ## The following objects are masked from 'package:dplyr':
    ## 
    ##     as_data_frame, groups, union
    ## The following objects are masked from 'package:purrr':
    ## 
    ##     compose, simplify
    ## The following object is masked from 'package:tidyr':
    ## 
    ##     crossing
    ## The following object is masked from 'package:tibble':
    ## 
    ##     as_data_frame
    ## The following objects are masked from 'package:stats':
    ## 
    ##     decompose, spectrum
    ## The following object is masked from 'package:base':
    ## 
    ##     union
    library(tidygraph)
    ## 
    ## Attaching package: 'tidygraph'
    ## The following object is masked from 'package:igraph':
    ## 
    ##     groups
    ## The following object is masked from 'package:stats':
    ## 
    ##     filter
    library(ggraph)
    library(statnet)
    ## Loading required package: tergm
    ## Loading required package: ergm
    ## Loading required package: network
    ## 
    ## 'network' 1.17.1 (2021-06-12), part of the Statnet Project
    ## * 'news(package="network")' for changes since last version
    ## * 'citation("network")' for citation information
    ## * 'https://statnet.org' for help, support, and other information
    ## 
    ## Attaching package: 'network'
    ## The following objects are masked from 'package:igraph':
    ## 
    ##     %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
    ##     get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
    ##     is.directed, list.edge.attributes, list.vertex.attributes,
    ##     set.edge.attribute, set.vertex.attribute
    ## 
    ## 'ergm' 4.1.2 (2021-07-26), part of the Statnet Project
    ## * 'news(package="ergm")' for changes since last version
    ## * 'citation("ergm")' for citation information
    ## * 'https://statnet.org' for help, support, and other information
    ## 'ergm' 4 is a major update that introduces some backwards-incompatible
    ## changes. Please type 'news(package="ergm")' for a list of major
    ## changes.
    ## Loading required package: networkDynamic
    ## 
    ## 'networkDynamic' 0.11.1 (2022-04-04), part of the Statnet Project
    ## * 'news(package="networkDynamic")' for changes since last version
    ## * 'citation("networkDynamic")' for citation information
    ## * 'https://statnet.org' for help, support, and other information
    ## Registered S3 method overwritten by 'tergm':
    ##   method                   from
    ##   simulate_formula.network ergm
    ## 
    ## 'tergm' 4.0.2 (2021-07-28), part of the Statnet Project
    ## * 'news(package="tergm")' for changes since last version
    ## * 'citation("tergm")' for citation information
    ## * 'https://statnet.org' for help, support, and other information
    ## 
    ## Attaching package: 'tergm'
    ## The following object is masked from 'package:ergm':
    ## 
    ##     snctrl
    ## Loading required package: ergm.count
    ## 
    ## 'ergm.count' 4.0.2 (2021-06-18), part of the Statnet Project
    ## * 'news(package="ergm.count")' for changes since last version
    ## * 'citation("ergm.count")' for citation information
    ## * 'https://statnet.org' for help, support, and other information
    ## Loading required package: sna
    ## Loading required package: statnet.common
    ## 
    ## Attaching package: 'statnet.common'
    ## The following object is masked from 'package:ergm':
    ## 
    ##     snctrl
    ## The following objects are masked from 'package:base':
    ## 
    ##     attr, order
    ## sna: Tools for Social Network Analysis
    ## Version 2.6 created on 2020-10-5.
    ## copyright (c) 2005, Carter T. Butts, University of California-Irvine
    ##  For citation information, type citation("sna").
    ##  Type help(package="sna") to get started.
    ## 
    ## Attaching package: 'sna'
    ## The following objects are masked from 'package:igraph':
    ## 
    ##     betweenness, bonpow, closeness, components, degree, dyad.census,
    ##     evcent, hierarchy, is.connected, neighborhood, triad.census
    ## Loading required package: tsna
    ## 
    ## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
    ## * 'news(package="statnet")' for changes since last version
    ## * 'citation("statnet")' for citation information
    ## * 'https://statnet.org' for help, support, and other information
    library(network)
    library(gephi)
    library(texreg)
    ## Version:  1.38.6
    ## Date:     2022-04-06
    ## Author:   Philip Leifeld (University of Essex)
    ## 
    ## Consider submitting praise using the praise or praise_interactive functions.
    ## Please cite the JSS article in your publications -- see citation("texreg").
    ## 
    ## Attaching package: 'texreg'
    ## The following object is masked from 'package:tidyr':
    ## 
    ##     extract
    library(btergm)
    ## Registered S3 methods overwritten by 'btergm':
    ##   method    from
    ##   plot.gof  ergm
    ##   print.gof ergm
    ## Package:  btergm
    ## Version:  1.10.6
    ## Date:     2022-04-01
    ## Authors:  Philip Leifeld (University of Essex)
    ##           Skyler J. Cranmer (The Ohio State University)
    ##           Bruce A. Desmarais (Pennsylvania State University)
    ## 
    ## Attaching package: 'btergm'
    ## The following object is masked from 'package:ergm':
    ## 
    ##     gof
    library(janitor)
    ## 
    ## Attaching package: 'janitor'
    ## The following objects are masked from 'package:stats':
    ## 
    ##     chisq.test, fisher.test
    1. Import dataset for DLT_1: there are two data files used in this analysis, the edge and the node list. Based on the results from my previous case studies and for the purpose of this work, I removed the facilitators and focused more on the interactions among the participants.
    #DLT_1 Ties, removing the facilitators
    dlt1_ties <- read_csv("data/dlt1-edges.csv", 
                      col_types = cols(Sender = col_character(), 
                                       Receiver = col_character(), 
                                       `Category Text` = col_skip(), 
                                       `Comment ID` = col_character(), 
                                       `Discussion ID` = col_character())) %>% clean_names()
    dlt1_ties<-dlt1_ties[!(dlt1_ties$sender==444 | dlt1_ties$sender==445),]
    dlt1_ties<-dlt1_ties[!(dlt1_ties$receiver==444 | dlt1_ties$receiver==445),]
    dlt1_ties <- dlt1_ties[, c('sender', 'receiver')]
    
    #DLT_1 Nodes, removing facilitators
    dlt1_actors <- read_csv("data/dlt1-nodes.csv", 
                        col_types = cols(UID = col_character(), 
                                         Facilitator = col_character(), 
                                         connect = col_character(), 
                                         expert = col_character()))  %>% 
      clean_names()
    dlt1_actors<-dlt1_actors[!(dlt1_actors$uid==444 | dlt1_actors$uid==445),]
    1. Data wrangle for DLT_1: In addition to removing the “facilitator” id, I only focused on the groups that have the “strongest components” from the DLT 1 and DLT 2 networks. I used the below code to demonstrate how to define such groups. I added the last two lines of code (without running the code) to show how I extracted the strongest network into a new dataset. I will show how to use the new dataset to perform the ERGMs.
    #DLT_1 Graph Object
    dlt1_network <- tbl_graph(edges = dlt1_ties,
                          nodes = dlt1_actors,
                          node_key = "uid",
                          directed = TRUE)
    dlt1_network
    ## # A tbl_graph: 443 nodes and 1644 edges
    ## #
    ## # A directed multigraph with 78 components
    ## #
    ## # Node Data: 443 × 13 (active)
    ##   uid   facilitator role1 experience experience2 grades location region country
    ##   <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
    ## 1 1     0           libm…          1 6 to 10     secon… VA       South  US     
    ## 2 2     0           clas…          1 6 to 10     secon… FL       South  US     
    ## 3 3     0           dist…          2 11 to 20    gener… PA       North… US     
    ## 4 4     0           clas…          2 11 to 20    middle NC       South  US     
    ## 5 5     0           othe…          3 20+         gener… AL       South  US     
    ## 6 6     0           clas…          1 4 to 5      gener… AL       South  US     
    ## # … with 437 more rows, and 4 more variables: group <chr>, gender <chr>,
    ## #   expert <chr>, connect <chr>
    ## #
    ## # Edge Data: 1,644 × 2
    ##    from    to
    ##   <int> <int>
    ## 1   355   356
    ## 2    19   310
    ## 3   216    19
    ## # … with 1,641 more rows
    #DLT_1 Group Analysis
    autograph(dlt1_network)

    dlt1_network <- dlt1_network  %>% 
      activate(nodes)  %>% 
      mutate(strong_component = group_components(type = "strong"))
    as_tibble(dlt1_network)
    ## # A tibble: 443 × 14
    ##    uid   facilitator role1 experience experience2 grades location region country
    ##    <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
    ##  1 1     0           libm…          1 6 to 10     secon… VA       South  US     
    ##  2 2     0           clas…          1 6 to 10     secon… FL       South  US     
    ##  3 3     0           dist…          2 11 to 20    gener… PA       North… US     
    ##  4 4     0           clas…          2 11 to 20    middle NC       South  US     
    ##  5 5     0           othe…          3 20+         gener… AL       South  US     
    ##  6 6     0           clas…          1 4 to 5      gener… AL       South  US     
    ##  7 7     0           inst…          2 11 to 20    gener… SD       Midwe… US     
    ##  8 8     0           spec…          1 6 to 10     secon… BE       Inter… BE     
    ##  9 9     0           clas…          1 6 to 10     middle NC       South  US     
    ## 10 10    0           scho…          2 11 to 20    middle NC       South  US     
    ## # … with 433 more rows, and 5 more variables: group <chr>, gender <chr>,
    ## #   expert <chr>, connect <chr>, strong_component <int>
    dlt1_network |>
      as_tibble() |>
      group_by(strong_component) |>
      summarise(count = n()) |>
      arrange(desc(count))
    ## # A tibble: 266 × 2
    ##    strong_component count
    ##               <int> <int>
    ##  1                1   177
    ##  2                2     2
    ##  3                3     1
    ##  4                4     1
    ##  5                5     1
    ##  6                6     1
    ##  7                7     1
    ##  8                8     1
    ##  9                9     1
    ## 10               10     1
    ## # … with 256 more rows
    #DLT1_Strongest Component
    strong1 <- dlt1_network |>
      activate(nodes) |>
      filter(strong_component == 1)
    class(strong1)
    ## [1] "tbl_graph" "igraph"
    #strong1 %>% gephi_write_nodes("nodes_subset1.csv")
    #strong1 %>% gephi_write_edges("nodes_subset1.csv")
    1. Data analysis (mathematical aspect) for DLT_1: I looked into three mathematical aspects of the DLT_1 network which are: size, centrality, betweenness and closeness. Form the results shown below, “uid 44” had the largest size as well as the highest betweenness and closeness.
    #size in the strongest network
    strong1 <- strong1 |>
      activate(nodes) |>
      mutate(size = local_size())
    
    strong1 |> 
      as_tibble() |>
      arrange(desc(size)) |> 
      select(uid, role1,size)
    ## # A tibble: 177 × 3
    ##    uid   role1              size
    ##    <chr> <chr>             <dbl>
    ##  1 44    instructionaltech    51
    ##  2 11    other                43
    ##  3 30    schooladmin          36
    ##  4 7     instructionaltech    33
    ##  5 60    classteaching        33
    ##  6 432   instructionaltech    33
    ##  7 19    otheredprof          29
    ##  8 36    classteaching        29
    ##  9 1     libmedia             28
    ## 10 24    instructionaltech    27
    ## # … with 167 more rows
    #centrality_degree
    strong1 <- strong1 |>
      activate(nodes) |>
      mutate(in_degree = centrality_degree(mode = "in"),
         out_degree = centrality_degree(mode = "out"),
         all_degree = centrality_degree(mode = "all"))
    
    strong1 |> 
      as_tibble()
    ## # A tibble: 177 × 18
    ##    uid   facilitator role1 experience experience2 grades location region country
    ##    <chr> <chr>       <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
    ##  1 1     0           libm…          1 6 to 10     secon… VA       South  US     
    ##  2 2     0           clas…          1 6 to 10     secon… FL       South  US     
    ##  3 3     0           dist…          2 11 to 20    gener… PA       North… US     
    ##  4 4     0           clas…          2 11 to 20    middle NC       South  US     
    ##  5 5     0           othe…          3 20+         gener… AL       South  US     
    ##  6 6     0           clas…          1 4 to 5      gener… AL       South  US     
    ##  7 7     0           inst…          2 11 to 20    gener… SD       Midwe… US     
    ##  8 8     0           spec…          1 6 to 10     secon… BE       Inter… BE     
    ##  9 9     0           clas…          1 6 to 10     middle NC       South  US     
    ## 10 10    0           scho…          2 11 to 20    middle NC       South  US     
    ## # … with 167 more rows, and 9 more variables: group <chr>, gender <chr>,
    ## #   expert <chr>, connect <chr>, strong_component <int>, size <dbl>,
    ## #   in_degree <dbl>, out_degree <dbl>, all_degree <dbl>
    strong1 |> 
      as_tibble() |>
      ggplot() +
      geom_histogram(aes(x = all_degree))
    ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

    #closeness and betweenness
    strong1 <- strong1 |>
      activate(nodes) |>
      mutate(close_ness = centrality_closeness(),
         between_ness = centrality_betweenness())
    ## Warning in betweenness(graph = graph, v = V(graph), directed = directed, :
    ## 'nobigint' is deprecated since igraph 1.3 and will be removed in igraph 1.4
    strong1 |> 
      as_tibble() |>
      arrange(desc(close_ness)) |> 
      select(uid, role1, between_ness,close_ness)
    ## # A tibble: 177 × 4
    ##    uid   role1             between_ness close_ness
    ##    <chr> <chr>                    <dbl>      <dbl>
    ##  1 44    instructionaltech        6643.    0.00262
    ##  2 432   instructionaltech        2572.    0.00258
    ##  3 30    schooladmin              3185.    0.00247
    ##  4 60    classteaching            2438.    0.00246
    ##  5 11    other                    2938.    0.00243
    ##  6 17    libmedia                  583.    0.00227
    ##  7 19    otheredprof              5793.    0.00227
    ##  8 24    instructionaltech        1411.    0.00226
    ##  9 29    schooladmin              1516.    0.00226
    ## 10 68    libmedia                  564.    0.00225
    ## # … with 167 more rows
    1. Data analysis (statistical aspect) for DLT_1: In session 2, I showed how to extract the edges and nodes of the strongest components. In this session, I demonstrated how to create a network object which requires us to identify and remove “loops” from the edge list. Moreover, I created dummy variables to present “role as an instructional technologist” and “male participant”. The steps of how to create a network object and how to conduct the ERGMs analysis are shown below:
    #DLT_Network Object
    #nodes and edges
    nodes <- read_csv("nodes_subset.csv")
    ## Rows: 177 Columns: 14
    ## ── Column specification ────────────────────────────────────────────────────────
    ## Delimiter: ","
    ## chr (10): role1, experience2, grades, location, region, country, group, gend...
    ## dbl  (4): Id, facilitator, experience, strong_component
    ## 
    ## ℹ Use `spec()` to retrieve the full column specification for this data.
    ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    nodes$male <- ifelse(nodes$gender == 'male', 1, 0) 
    nodes$role_ins <- ifelse(nodes$role1 == 'instructionaltech',1,0) 
    
    edges <- read_csv("edges_subset.csv")
    ## Rows: 1187 Columns: 2
    ## ── Column specification ────────────────────────────────────────────────────────
    ## Delimiter: ","
    ## dbl (2): Source, Target
    ## 
    ## ℹ Use `spec()` to retrieve the full column specification for this data.
    ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    edges <- edges %>%  
      filter(!(Source == Target)) 
    edges <- unique(edges[c("Source", "Target")])  
    
    DLT1_network <- network(edges, vertex.attr = nodes, matrix.type = "edgelist", ignore.eval = FALSE)
    
    #ERGMs_DLT1
    summary(DLT1_network ~ edges + mutual) #983 edges and 137 mutual
    ##  edges mutual 
    ##    983    137
    set.seed(589)
    ergm_mod_1 <-ergm(DLT1_network ~ edges + mutual)
    ## Starting maximum pseudolikelihood estimation (MPLE):
    ## Evaluating the predictor and response matrix.
    ## Maximizing the pseudolikelihood.
    ## Finished MPLE.
    ## Starting Monte Carlo maximum likelihood estimation (MCMLE):
    ## Iteration 1 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 2.4781.
    ## Estimating equations are not within tolerance region.
    ## Iteration 2 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.1041.
    ## Estimating equations are not within tolerance region.
    ## Iteration 3 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.1273.
    ## Estimating equations are not within tolerance region.
    ## Iteration 4 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0073.
    ## Convergence test p-value: 0.2841. Not converged with 99% confidence; increasing sample size.
    ## Iteration 5 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0458.
    ## Convergence test p-value: 0.0875. Not converged with 99% confidence; increasing sample size.
    ## Iteration 6 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0031.
    ## Convergence test p-value: 0.0053. Converged with 99% confidence.
    ## Finished MCMLE.
    ## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
    ## Bridging between the dyad-independent submodel and the full model...
    ## Setting up bridge sampling...
    ## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
    ## Bridging finished.
    ## This model was fit using MCMC.  To examine model diagnostics and check
    ## for degeneracy, use the mcmc.diagnostics() function.
    summary(ergm_mod_1) #This shows that there are more mutual links in the DLT 1 network than what would expect 
    ## Call:
    ## ergm(formula = DLT1_network ~ edges + mutual)
    ## 
    ## Monte Carlo Maximum Likelihood Results:
    ## 
    ##        Estimate Std. Error MCMC % z value Pr(>|z|)    
    ## edges  -3.72956    0.03912      0  -95.34   <1e-04 ***
    ## mutual  2.78003    0.11687      0   23.79   <1e-04 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ##      Null Deviance: 43186  on 31152  degrees of freedom
    ##  Residual Deviance:  8307  on 31150  degrees of freedom
    ##  
    ## AIC: 8311  BIC: 8327  (Smaller is better. MC Std. Err. = 3.235)
    #from a random network with 983 edges. 
    
    ergm_2 <- ergm(DLT1_network ~ edges +
                 mutual +
                 nodefactor('male'))
    ## Starting maximum pseudolikelihood estimation (MPLE):
    ## Evaluating the predictor and response matrix.
    ## Maximizing the pseudolikelihood.
    ## Finished MPLE.
    ## Starting Monte Carlo maximum likelihood estimation (MCMLE):
    ## Iteration 1 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.6891.
    ## Estimating equations are not within tolerance region.
    ## Iteration 2 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.1351.
    ## Estimating equations are not within tolerance region.
    ## Iteration 3 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0964.
    ## Convergence test p-value: 0.4810. Not converged with 99% confidence; increasing sample size.
    ## Iteration 4 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0573.
    ## Convergence test p-value: 0.1041. Not converged with 99% confidence; increasing sample size.
    ## Iteration 5 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0050.
    ## Convergence test p-value: 0.0293. Not converged with 99% confidence; increasing sample size.
    ## Iteration 6 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0152.
    ## Convergence test p-value: < 0.0001. Converged with 99% confidence.
    ## Finished MCMLE.
    ## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
    ## Bridging between the dyad-independent submodel and the full model...
    ## Setting up bridge sampling...
    ## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
    ## Bridging finished.
    ## This model was fit using MCMC.  To examine model diagnostics and check
    ## for degeneracy, use the mcmc.diagnostics() function.
    summary(ergm_2)#Results suggest that there is not a statistically significant difference bewteen male and female
    ## Call:
    ## ergm(formula = DLT1_network ~ edges + mutual + nodefactor("male"))
    ## 
    ## Monte Carlo Maximum Likelihood Results:
    ## 
    ##                    Estimate Std. Error MCMC % z value Pr(>|z|)    
    ## edges             -3.728299   0.050491      0 -73.841   <1e-04 ***
    ## mutual             2.789228   0.115204      0  24.211   <1e-04 ***
    ## nodefactor.male.1 -0.000395   0.042684      0  -0.009    0.993    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ##      Null Deviance: 43186  on 31152  degrees of freedom
    ##  Residual Deviance:  8312  on 31149  degrees of freedom
    ##  
    ## AIC: 8318  BIC: 8343  (Smaller is better. MC Std. Err. = 1.746)
    #participants form discussions in the DLT 1 network. 
    
    ergm_3 <- ergm(DLT1_network ~ edges +
                 mutual +
                 nodefactor('role_ins') +
                 nodefactor('male'))
    ## Starting maximum pseudolikelihood estimation (MPLE):
    ## Evaluating the predictor and response matrix.
    ## Maximizing the pseudolikelihood.
    ## Finished MPLE.
    ## Starting Monte Carlo maximum likelihood estimation (MCMLE):
    ## Iteration 1 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.2830.
    ## Estimating equations are not within tolerance region.
    ## Iteration 2 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0998.
    ## Convergence test p-value: 0.5037. Not converged with 99% confidence; increasing sample size.
    ## Iteration 3 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0117.
    ## Convergence test p-value: < 0.0001. Converged with 99% confidence.
    ## Finished MCMLE.
    ## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
    ## Bridging between the dyad-independent submodel and the full model...
    ## Setting up bridge sampling...
    ## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
    ## Bridging finished.
    ## This model was fit using MCMC.  To examine model diagnostics and check
    ## for degeneracy, use the mcmc.diagnostics() function.
    summary(ergm_3)
    ## Call:
    ## ergm(formula = DLT1_network ~ edges + mutual + nodefactor("role_ins") + 
    ##     nodefactor("male"))
    ## 
    ## Monte Carlo Maximum Likelihood Results:
    ## 
    ##                        Estimate Std. Error MCMC % z value Pr(>|z|)    
    ## edges                 -3.625840   0.053677      0 -67.549   <1e-04 ***
    ## mutual                 2.764360   0.111514      0  24.789   <1e-04 ***
    ## nodefactor.role_ins.1 -0.234504   0.052932      0  -4.430   <1e-04 ***
    ## nodefactor.male.1     -0.007545   0.043283      0  -0.174    0.862    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ##      Null Deviance: 43186  on 31152  degrees of freedom
    ##  Residual Deviance:  8285  on 31148  degrees of freedom
    ##  
    ## AIC: 8293  BIC: 8327  (Smaller is better. MC Std. Err. = 1.781)
    ergm_3_gof <- btergm::gof(ergm_3, nsim = 50)
    ## 
    ## Starting GOF assessment on a single computing core....
    ## 
    ## No 'target' network(s) provided. Using networks on the left-hand side of the model formula as observed networks.
    ## 
    ## Simulating 50 networks from the following formula:
    ##  DLT1_network ~ edges + mutual + nodefactor("role_ins") + nodefactor("male") 
    ## 
    ## One network from which simulations are drawn was provided.
    ## 
    ## Processing statistic: Dyad-wise shared partners
    ## Processing statistic: Edge-wise shared partners
    ## Processing statistic: Degree
    ## Processing statistic: Indegree
    ## Processing statistic: Geodesic distances
    ## Processing statistic: Tie prediction
    ## Processing statistic: Modularity (walktrap)
    plot(ergm_3_gof)

    1. Data wrangle and analysis for DLT_2: I showed detailed steps from session 1 to 4 on the data analysis process for the DLT_1 cohort so I did not include all the steps here for DLT_2 to save some space. I only provided ERGMs part for DLT_2 cohort in this session.
    #DLT2_Network Object
    #nodes and edges
    nodes2 <- read_csv("nodes_subset2.csv")
    ## Rows: 190 Columns: 20
    ## ── Column specification ────────────────────────────────────────────────────────
    ## Delimiter: ","
    ## chr  (7): role, grades, location, region, country, group, gender
    ## dbl (13): Id, facilitator, experience2, experience, expert, connect, strong_...
    ## 
    ## ℹ Use `spec()` to retrieve the full column specification for this data.
    ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    nodes2$male <- ifelse(nodes2$gender == 'male', 1, 0) 
    nodes2$role_ins <- ifelse(nodes2$role == 'instructionaltech',1,0) 
    
    edges <- read_csv("edges_subset2.csv")
    ## Rows: 1370 Columns: 2
    ## ── Column specification ────────────────────────────────────────────────────────
    ## Delimiter: ","
    ## dbl (2): Source, Target
    ## 
    ## ℹ Use `spec()` to retrieve the full column specification for this data.
    ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
    edges <- edges %>%  
      filter(!(Source == Target)) 
    edges <- unique(edges[c("Source", "Target")])  
    
    DLT2_network <- network(edges, vertex.attr = nodes, matrix.type = "edgelist", ignore.eval = FALSE)
    
    #ERGMs_DLT2
    summary(DLT2_network ~ edges + mutual) #1176 edges and 201 mutual
    ##  edges mutual 
    ##   1176    201
    set.seed(589)
    ergm_mod_2 <-ergm(DLT2_network ~ edges + mutual)
    ## Starting maximum pseudolikelihood estimation (MPLE):
    ## Evaluating the predictor and response matrix.
    ## Maximizing the pseudolikelihood.
    ## Finished MPLE.
    ## Starting Monte Carlo maximum likelihood estimation (MCMLE):
    ## Iteration 1 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 1.5239.
    ## Estimating equations are not within tolerance region.
    ## Iteration 2 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.1742.
    ## Estimating equations are not within tolerance region.
    ## Iteration 3 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0490.
    ## Convergence test p-value: 0.3563. Not converged with 99% confidence; increasing sample size.
    ## Iteration 4 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0824.
    ## Convergence test p-value: 0.1820. Not converged with 99% confidence; increasing sample size.
    ## Iteration 5 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0103.
    ## Convergence test p-value: 0.0053. Converged with 99% confidence.
    ## Finished MCMLE.
    ## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
    ## Bridging between the dyad-independent submodel and the full model...
    ## Setting up bridge sampling...
    ## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
    ## Bridging finished.
    ## This model was fit using MCMC.  To examine model diagnostics and check
    ## for degeneracy, use the mcmc.diagnostics() function.
    summary(ergm_mod_2) #This shows that there are more mutual links in the DLT 2 network than what would expect 
    ## Call:
    ## ergm(formula = DLT2_network ~ edges + mutual)
    ## 
    ## Monte Carlo Maximum Likelihood Results:
    ## 
    ##        Estimate Std. Error MCMC % z value Pr(>|z|)    
    ## edges  -3.78150    0.03603      0 -104.97   <1e-04 ***
    ## mutual  3.12469    0.11015      0   28.37   <1e-04 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ##      Null Deviance: 49782  on 35910  degrees of freedom
    ##  Residual Deviance:  9647  on 35908  degrees of freedom
    ##  
    ## AIC: 9651  BIC: 9668  (Smaller is better. MC Std. Err. = 3.428)
    #from a random network with 1176 edges. 
    
    ergm_4 <- ergm(DLT2_network ~ edges +
                 mutual +
                 nodefactor('male'))
    ## Starting maximum pseudolikelihood estimation (MPLE):
    ## Evaluating the predictor and response matrix.
    ## Maximizing the pseudolikelihood.
    ## Finished MPLE.
    ## Starting Monte Carlo maximum likelihood estimation (MCMLE):
    ## Iteration 1 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 1.4331.
    ## Estimating equations are not within tolerance region.
    ## Iteration 2 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0359.
    ## Convergence test p-value: 0.1534. Not converged with 99% confidence; increasing sample size.
    ## Iteration 3 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0527.
    ## Convergence test p-value: 0.0486. Not converged with 99% confidence; increasing sample size.
    ## Iteration 4 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0689.
    ## Convergence test p-value: 0.0959. Not converged with 99% confidence; increasing sample size.
    ## Iteration 5 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0113.
    ## Convergence test p-value: 0.0001. Converged with 99% confidence.
    ## Finished MCMLE.
    ## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
    ## Bridging between the dyad-independent submodel and the full model...
    ## Setting up bridge sampling...
    ## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
    ## Bridging finished.
    ## This model was fit using MCMC.  To examine model diagnostics and check
    ## for degeneracy, use the mcmc.diagnostics() function.
    summary(ergm_4)#Results suggest that there is a statistically significant difference between male and female
    ## Call:
    ## ergm(formula = DLT2_network ~ edges + mutual + nodefactor("male"))
    ## 
    ## Monte Carlo Maximum Likelihood Results:
    ## 
    ##                   Estimate Std. Error MCMC % z value Pr(>|z|)    
    ## edges             -3.65305    0.04533      0 -80.586   <1e-04 ***
    ## mutual             3.11277    0.09993      0  31.150   <1e-04 ***
    ## nodefactor.male.1 -0.19373    0.04080      0  -4.748   <1e-04 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ##      Null Deviance: 49782  on 35910  degrees of freedom
    ##  Residual Deviance:  9618  on 35907  degrees of freedom
    ##  
    ## AIC: 9624  BIC: 9650  (Smaller is better. MC Std. Err. = 3.097)
    #participants form discussions in the DLT 2 network. The probability for male participants of forming a discussion thread 
    #is around 0.2 lower than female participants.
    
    ergm_5 <- ergm(DLT2_network ~ edges +
                 mutual +
                 nodefactor('role_ins') +
                 nodefactor('male'))
    ## Starting maximum pseudolikelihood estimation (MPLE):
    ## Evaluating the predictor and response matrix.
    ## Maximizing the pseudolikelihood.
    ## Finished MPLE.
    ## Starting Monte Carlo maximum likelihood estimation (MCMLE):
    ## Iteration 1 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.4537.
    ## Estimating equations are not within tolerance region.
    ## Iteration 2 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0718.
    ## Convergence test p-value: 0.2546. Not converged with 99% confidence; increasing sample size.
    ## Iteration 3 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0057.
    ## Convergence test p-value: 0.0183. Not converged with 99% confidence; increasing sample size.
    ## Iteration 4 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0085.
    ## Convergence test p-value: 0.0126. Not converged with 99% confidence; increasing sample size.
    ## Iteration 5 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0165.
    ## Convergence test p-value: 0.1839. Not converged with 99% confidence; increasing sample size.
    ## Iteration 6 of at most 60:
    ## Optimizing with step length 1.0000.
    ## The log-likelihood improved by 0.0104.
    ## Convergence test p-value: 0.0048. Converged with 99% confidence.
    ## Finished MCMLE.
    ## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
    ## Bridging between the dyad-independent submodel and the full model...
    ## Setting up bridge sampling...
    ## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
    ## Bridging finished.
    ## This model was fit using MCMC.  To examine model diagnostics and check
    ## for degeneracy, use the mcmc.diagnostics() function.
    summary(ergm_5)
    ## Call:
    ## ergm(formula = DLT2_network ~ edges + mutual + nodefactor("role_ins") + 
    ##     nodefactor("male"))
    ## 
    ## Monte Carlo Maximum Likelihood Results:
    ## 
    ##                       Estimate Std. Error MCMC % z value Pr(>|z|)    
    ## edges                 -3.53303    0.04950      0 -71.371   <1e-04 ***
    ## mutual                 3.08466    0.10160      0  30.361   <1e-04 ***
    ## nodefactor.role_ins.1 -0.30424    0.05234      0  -5.813   <1e-04 ***
    ## nodefactor.male.1     -0.19941    0.03878      0  -5.142   <1e-04 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ##      Null Deviance: 49782  on 35910  degrees of freedom
    ##  Residual Deviance:  9582  on 35906  degrees of freedom
    ##  
    ## AIC: 9590  BIC: 9624  (Smaller is better. MC Std. Err. = 2.431)
    ergm_5_gof <- btergm::gof(ergm_5, nsim = 50)
    ## 
    ## Starting GOF assessment on a single computing core....
    ## 
    ## No 'target' network(s) provided. Using networks on the left-hand side of the model formula as observed networks.
    ## 
    ## Simulating 50 networks from the following formula:
    ##  DLT2_network ~ edges + mutual + nodefactor("role_ins") + nodefactor("male") 
    ## 
    ## One network from which simulations are drawn was provided.
    ## 
    ## Processing statistic: Dyad-wise shared partners
    ## Processing statistic: Edge-wise shared partners
    ## Processing statistic: Degree
    ## Processing statistic: Indegree
    ## Processing statistic: Geodesic distances
    ## Processing statistic: Tie prediction
    ## Processing statistic: Modularity (walktrap)
    plot(ergm_5_gof)

  4. Data visualization. Based on the results from Section 3 Data Analysis, we can learn that in both cohorts, the “instrucitonaltech” group had the lower total degrees and lower betweenness and closeness compared with the other groups. The figure below shows the degree distributions among the two role groups in the two cohorts.

In addition, the ERGMs results show that, for the DLT_1 cohort, the “instrucitonaltech” role is a statistically significant predictor for participants’ online interactions which is shown in the table below.

```r
 DLT_1 <- screenreg(list(ergm_2,ergm_3))
 DLT_1
```

```
## 
## =================================================
##                        Model 1       Model 2     
## -------------------------------------------------
## edges                     -3.73 ***     -3.63 ***
##                           (0.05)        (0.05)   
## mutual                     2.79 ***      2.76 ***
##                           (0.12)        (0.11)   
## nodefactor.male.1         -0.00         -0.01    
##                           (0.04)        (0.04)   
## nodefactor.role_ins.1                   -0.23 ***
##                                         (0.05)   
## -------------------------------------------------
## AIC                     8318.05       8293.23    
## BIC                     8343.09       8326.62    
## Log Likelihood         -4156.03      -4142.62    
## =================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
```

As for the DLT_2 cohort, “gender” and “role” are both statistically significant predictors for participants’ online interactions

```r
 DLT_2 <- screenreg(list(ergm_4,ergm_5))
 DLT_2
```

```
## 
## =================================================
##                        Model 1       Model 2     
## -------------------------------------------------
## edges                     -3.65 ***     -3.53 ***
##                           (0.05)        (0.05)   
## mutual                     3.11 ***      3.08 ***
##                           (0.10)        (0.10)   
## nodefactor.male.1         -0.19 ***     -0.20 ***
##                           (0.04)        (0.04)   
## nodefactor.role_ins.1                   -0.30 ***
##                                         (0.05)   
## -------------------------------------------------
## AIC                     9624.27       9590.07    
## BIC                     9649.73       9624.02    
## Log Likelihood         -4809.13      -4791.03    
## =================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
```
  1. Conclusion. In summary, my final project examined two guided questions which are: in what ways participants of the DLT 1 and DLT 2 networks formed interactions and 2) what participants’ attributes that significantly predicted their interactions in the DLT1 and 2 courses. In particular, this project looks at the strongest component in each cohort and further explores how “gender” and “role” relate to participants’ online interactions. For the DLT_1 cohort, the strongest component consists of 177 counts (983 edges and 137 mutual) while the DLT_2 cohort has 190 counts (1176 edges and 201 mutual). In both cohorts, the “instructionaltech” background participant(s) had the largest/second largest sizes. The network measures show that the participants were forming more mutual links in both cohorts. However, participants from the “instructionaltech” role tended to have a negative effect on the online interactions in both cohorts and the “gender” factor of being a male participant had a negative effect only in the DLT_2 cohort. According to the Goodness of Fit graphs, I can see that the current models may need to consider including some other factors to further explore the relationships between participants’ attributes on influencing their online interactions in the studied online courses.

  2. References.

    Kellogg, S., Booth, S., & Oliver, K. (2014). A social network perspective on peer supported learning in MOOCs for educatorsInternational Review of Research in Open and Distributed Learning15(5), 263-289.