This final project analysis is based on data collected from 2 cohorts of participants who enrolled the 6-week course “Planning for the Digital Learning Transition in K-12 Schools” in 2013. This course was designed and implemented by the Friday Institute at the North Carolina State University and aimed to help the design and implementation process of the K-12 digital learning initiates (Kellogg, Booth & Oliver, 2014).
Data source. The data sources include 1) the csv file with all participants’ attributes, such as ID, role, experience and location; 2) the csv file with all participants’ discussion data within the course online forum for both cohorts. The first cohort is labeled as “DLT_1” and the second cohort is labeled as “DLT_2”.
Guided questions. The guided questions for this final project are from my previous case study analysis on the DLT_2 data set. In that case study, I was interested in seeing how participants from the instructional technology background engaged in the discussion as they may involve in the planning and implementation process of digital learning initiates. The preliminary results of that case stud0y showed that the “instructionaltech” group had a higher betweenness which means that their options/posts might lead to more discussions among participants. In this final project, I aimed to examine how the “instructionaltech” and “gender” factors affect the discussions among participants in both cohorts. This final study was guided by two questions:
Data analysis. This section contains five sessions starting from data wrangling to modeling. Each session provides a description and the matched code blocks to present the details of each step of the analysis.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.5 ✔ purrr 0.3.4
## ✔ tibble 3.1.6 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
##
## compose, simplify
## The following object is masked from 'package:tidyr':
##
## crossing
## The following object is masked from 'package:tibble':
##
## as_data_frame
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
library(tidygraph)
##
## Attaching package: 'tidygraph'
## The following object is masked from 'package:igraph':
##
## groups
## The following object is masked from 'package:stats':
##
## filter
library(ggraph)
library(statnet)
## Loading required package: tergm
## Loading required package: ergm
## Loading required package: network
##
## 'network' 1.17.1 (2021-06-12), part of the Statnet Project
## * 'news(package="network")' for changes since last version
## * 'citation("network")' for citation information
## * 'https://statnet.org' for help, support, and other information
##
## Attaching package: 'network'
## The following objects are masked from 'package:igraph':
##
## %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
## get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
## is.directed, list.edge.attributes, list.vertex.attributes,
## set.edge.attribute, set.vertex.attribute
##
## 'ergm' 4.1.2 (2021-07-26), part of the Statnet Project
## * 'news(package="ergm")' for changes since last version
## * 'citation("ergm")' for citation information
## * 'https://statnet.org' for help, support, and other information
## 'ergm' 4 is a major update that introduces some backwards-incompatible
## changes. Please type 'news(package="ergm")' for a list of major
## changes.
## Loading required package: networkDynamic
##
## 'networkDynamic' 0.11.1 (2022-04-04), part of the Statnet Project
## * 'news(package="networkDynamic")' for changes since last version
## * 'citation("networkDynamic")' for citation information
## * 'https://statnet.org' for help, support, and other information
## Registered S3 method overwritten by 'tergm':
## method from
## simulate_formula.network ergm
##
## 'tergm' 4.0.2 (2021-07-28), part of the Statnet Project
## * 'news(package="tergm")' for changes since last version
## * 'citation("tergm")' for citation information
## * 'https://statnet.org' for help, support, and other information
##
## Attaching package: 'tergm'
## The following object is masked from 'package:ergm':
##
## snctrl
## Loading required package: ergm.count
##
## 'ergm.count' 4.0.2 (2021-06-18), part of the Statnet Project
## * 'news(package="ergm.count")' for changes since last version
## * 'citation("ergm.count")' for citation information
## * 'https://statnet.org' for help, support, and other information
## Loading required package: sna
## Loading required package: statnet.common
##
## Attaching package: 'statnet.common'
## The following object is masked from 'package:ergm':
##
## snctrl
## The following objects are masked from 'package:base':
##
## attr, order
## sna: Tools for Social Network Analysis
## Version 2.6 created on 2020-10-5.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
## For citation information, type citation("sna").
## Type help(package="sna") to get started.
##
## Attaching package: 'sna'
## The following objects are masked from 'package:igraph':
##
## betweenness, bonpow, closeness, components, degree, dyad.census,
## evcent, hierarchy, is.connected, neighborhood, triad.census
## Loading required package: tsna
##
## 'statnet' 2019.6 (2019-06-13), part of the Statnet Project
## * 'news(package="statnet")' for changes since last version
## * 'citation("statnet")' for citation information
## * 'https://statnet.org' for help, support, and other information
library(network)
library(gephi)
library(texreg)
## Version: 1.38.6
## Date: 2022-04-06
## Author: Philip Leifeld (University of Essex)
##
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
##
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
##
## extract
library(btergm)
## Registered S3 methods overwritten by 'btergm':
## method from
## plot.gof ergm
## print.gof ergm
## Package: btergm
## Version: 1.10.6
## Date: 2022-04-01
## Authors: Philip Leifeld (University of Essex)
## Skyler J. Cranmer (The Ohio State University)
## Bruce A. Desmarais (Pennsylvania State University)
##
## Attaching package: 'btergm'
## The following object is masked from 'package:ergm':
##
## gof
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
#DLT_1 Ties, removing the facilitators
dlt1_ties <- read_csv("data/dlt1-edges.csv",
col_types = cols(Sender = col_character(),
Receiver = col_character(),
`Category Text` = col_skip(),
`Comment ID` = col_character(),
`Discussion ID` = col_character())) %>% clean_names()
dlt1_ties<-dlt1_ties[!(dlt1_ties$sender==444 | dlt1_ties$sender==445),]
dlt1_ties<-dlt1_ties[!(dlt1_ties$receiver==444 | dlt1_ties$receiver==445),]
dlt1_ties <- dlt1_ties[, c('sender', 'receiver')]
#DLT_1 Nodes, removing facilitators
dlt1_actors <- read_csv("data/dlt1-nodes.csv",
col_types = cols(UID = col_character(),
Facilitator = col_character(),
connect = col_character(),
expert = col_character())) %>%
clean_names()
dlt1_actors<-dlt1_actors[!(dlt1_actors$uid==444 | dlt1_actors$uid==445),]
#DLT_1 Graph Object
dlt1_network <- tbl_graph(edges = dlt1_ties,
nodes = dlt1_actors,
node_key = "uid",
directed = TRUE)
dlt1_network
## # A tbl_graph: 443 nodes and 1644 edges
## #
## # A directed multigraph with 78 components
## #
## # Node Data: 443 × 13 (active)
## uid facilitator role1 experience experience2 grades location region country
## <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 1 0 libm… 1 6 to 10 secon… VA South US
## 2 2 0 clas… 1 6 to 10 secon… FL South US
## 3 3 0 dist… 2 11 to 20 gener… PA North… US
## 4 4 0 clas… 2 11 to 20 middle NC South US
## 5 5 0 othe… 3 20+ gener… AL South US
## 6 6 0 clas… 1 4 to 5 gener… AL South US
## # … with 437 more rows, and 4 more variables: group <chr>, gender <chr>,
## # expert <chr>, connect <chr>
## #
## # Edge Data: 1,644 × 2
## from to
## <int> <int>
## 1 355 356
## 2 19 310
## 3 216 19
## # … with 1,641 more rows
#DLT_1 Group Analysis
autograph(dlt1_network)
dlt1_network <- dlt1_network %>%
activate(nodes) %>%
mutate(strong_component = group_components(type = "strong"))
as_tibble(dlt1_network)
## # A tibble: 443 × 14
## uid facilitator role1 experience experience2 grades location region country
## <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 1 0 libm… 1 6 to 10 secon… VA South US
## 2 2 0 clas… 1 6 to 10 secon… FL South US
## 3 3 0 dist… 2 11 to 20 gener… PA North… US
## 4 4 0 clas… 2 11 to 20 middle NC South US
## 5 5 0 othe… 3 20+ gener… AL South US
## 6 6 0 clas… 1 4 to 5 gener… AL South US
## 7 7 0 inst… 2 11 to 20 gener… SD Midwe… US
## 8 8 0 spec… 1 6 to 10 secon… BE Inter… BE
## 9 9 0 clas… 1 6 to 10 middle NC South US
## 10 10 0 scho… 2 11 to 20 middle NC South US
## # … with 433 more rows, and 5 more variables: group <chr>, gender <chr>,
## # expert <chr>, connect <chr>, strong_component <int>
dlt1_network |>
as_tibble() |>
group_by(strong_component) |>
summarise(count = n()) |>
arrange(desc(count))
## # A tibble: 266 × 2
## strong_component count
## <int> <int>
## 1 1 177
## 2 2 2
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 1
## 7 7 1
## 8 8 1
## 9 9 1
## 10 10 1
## # … with 256 more rows
#DLT1_Strongest Component
strong1 <- dlt1_network |>
activate(nodes) |>
filter(strong_component == 1)
class(strong1)
## [1] "tbl_graph" "igraph"
#strong1 %>% gephi_write_nodes("nodes_subset1.csv")
#strong1 %>% gephi_write_edges("nodes_subset1.csv")
#size in the strongest network
strong1 <- strong1 |>
activate(nodes) |>
mutate(size = local_size())
strong1 |>
as_tibble() |>
arrange(desc(size)) |>
select(uid, role1,size)
## # A tibble: 177 × 3
## uid role1 size
## <chr> <chr> <dbl>
## 1 44 instructionaltech 51
## 2 11 other 43
## 3 30 schooladmin 36
## 4 7 instructionaltech 33
## 5 60 classteaching 33
## 6 432 instructionaltech 33
## 7 19 otheredprof 29
## 8 36 classteaching 29
## 9 1 libmedia 28
## 10 24 instructionaltech 27
## # … with 167 more rows
#centrality_degree
strong1 <- strong1 |>
activate(nodes) |>
mutate(in_degree = centrality_degree(mode = "in"),
out_degree = centrality_degree(mode = "out"),
all_degree = centrality_degree(mode = "all"))
strong1 |>
as_tibble()
## # A tibble: 177 × 18
## uid facilitator role1 experience experience2 grades location region country
## <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 1 0 libm… 1 6 to 10 secon… VA South US
## 2 2 0 clas… 1 6 to 10 secon… FL South US
## 3 3 0 dist… 2 11 to 20 gener… PA North… US
## 4 4 0 clas… 2 11 to 20 middle NC South US
## 5 5 0 othe… 3 20+ gener… AL South US
## 6 6 0 clas… 1 4 to 5 gener… AL South US
## 7 7 0 inst… 2 11 to 20 gener… SD Midwe… US
## 8 8 0 spec… 1 6 to 10 secon… BE Inter… BE
## 9 9 0 clas… 1 6 to 10 middle NC South US
## 10 10 0 scho… 2 11 to 20 middle NC South US
## # … with 167 more rows, and 9 more variables: group <chr>, gender <chr>,
## # expert <chr>, connect <chr>, strong_component <int>, size <dbl>,
## # in_degree <dbl>, out_degree <dbl>, all_degree <dbl>
strong1 |>
as_tibble() |>
ggplot() +
geom_histogram(aes(x = all_degree))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#closeness and betweenness
strong1 <- strong1 |>
activate(nodes) |>
mutate(close_ness = centrality_closeness(),
between_ness = centrality_betweenness())
## Warning in betweenness(graph = graph, v = V(graph), directed = directed, :
## 'nobigint' is deprecated since igraph 1.3 and will be removed in igraph 1.4
strong1 |>
as_tibble() |>
arrange(desc(close_ness)) |>
select(uid, role1, between_ness,close_ness)
## # A tibble: 177 × 4
## uid role1 between_ness close_ness
## <chr> <chr> <dbl> <dbl>
## 1 44 instructionaltech 6643. 0.00262
## 2 432 instructionaltech 2572. 0.00258
## 3 30 schooladmin 3185. 0.00247
## 4 60 classteaching 2438. 0.00246
## 5 11 other 2938. 0.00243
## 6 17 libmedia 583. 0.00227
## 7 19 otheredprof 5793. 0.00227
## 8 24 instructionaltech 1411. 0.00226
## 9 29 schooladmin 1516. 0.00226
## 10 68 libmedia 564. 0.00225
## # … with 167 more rows
#DLT_Network Object
#nodes and edges
nodes <- read_csv("nodes_subset.csv")
## Rows: 177 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): role1, experience2, grades, location, region, country, group, gend...
## dbl (4): Id, facilitator, experience, strong_component
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nodes$male <- ifelse(nodes$gender == 'male', 1, 0)
nodes$role_ins <- ifelse(nodes$role1 == 'instructionaltech',1,0)
edges <- read_csv("edges_subset.csv")
## Rows: 1187 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Source, Target
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
edges <- edges %>%
filter(!(Source == Target))
edges <- unique(edges[c("Source", "Target")])
DLT1_network <- network(edges, vertex.attr = nodes, matrix.type = "edgelist", ignore.eval = FALSE)
#ERGMs_DLT1
summary(DLT1_network ~ edges + mutual) #983 edges and 137 mutual
## edges mutual
## 983 137
set.seed(589)
ergm_mod_1 <-ergm(DLT1_network ~ edges + mutual)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 2.4781.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1041.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1273.
## Estimating equations are not within tolerance region.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0073.
## Convergence test p-value: 0.2841. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0458.
## Convergence test p-value: 0.0875. Not converged with 99% confidence; increasing sample size.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0031.
## Convergence test p-value: 0.0053. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_mod_1) #This shows that there are more mutual links in the DLT 1 network than what would expect
## Call:
## ergm(formula = DLT1_network ~ edges + mutual)
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.72956 0.03912 0 -95.34 <1e-04 ***
## mutual 2.78003 0.11687 0 23.79 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 43186 on 31152 degrees of freedom
## Residual Deviance: 8307 on 31150 degrees of freedom
##
## AIC: 8311 BIC: 8327 (Smaller is better. MC Std. Err. = 3.235)
#from a random network with 983 edges.
ergm_2 <- ergm(DLT1_network ~ edges +
mutual +
nodefactor('male'))
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.6891.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1351.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0964.
## Convergence test p-value: 0.4810. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0573.
## Convergence test p-value: 0.1041. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0050.
## Convergence test p-value: 0.0293. Not converged with 99% confidence; increasing sample size.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0152.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_2)#Results suggest that there is not a statistically significant difference bewteen male and female
## Call:
## ergm(formula = DLT1_network ~ edges + mutual + nodefactor("male"))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.728299 0.050491 0 -73.841 <1e-04 ***
## mutual 2.789228 0.115204 0 24.211 <1e-04 ***
## nodefactor.male.1 -0.000395 0.042684 0 -0.009 0.993
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 43186 on 31152 degrees of freedom
## Residual Deviance: 8312 on 31149 degrees of freedom
##
## AIC: 8318 BIC: 8343 (Smaller is better. MC Std. Err. = 1.746)
#participants form discussions in the DLT 1 network.
ergm_3 <- ergm(DLT1_network ~ edges +
mutual +
nodefactor('role_ins') +
nodefactor('male'))
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.2830.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0998.
## Convergence test p-value: 0.5037. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0117.
## Convergence test p-value: < 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_3)
## Call:
## ergm(formula = DLT1_network ~ edges + mutual + nodefactor("role_ins") +
## nodefactor("male"))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.625840 0.053677 0 -67.549 <1e-04 ***
## mutual 2.764360 0.111514 0 24.789 <1e-04 ***
## nodefactor.role_ins.1 -0.234504 0.052932 0 -4.430 <1e-04 ***
## nodefactor.male.1 -0.007545 0.043283 0 -0.174 0.862
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 43186 on 31152 degrees of freedom
## Residual Deviance: 8285 on 31148 degrees of freedom
##
## AIC: 8293 BIC: 8327 (Smaller is better. MC Std. Err. = 1.781)
ergm_3_gof <- btergm::gof(ergm_3, nsim = 50)
##
## Starting GOF assessment on a single computing core....
##
## No 'target' network(s) provided. Using networks on the left-hand side of the model formula as observed networks.
##
## Simulating 50 networks from the following formula:
## DLT1_network ~ edges + mutual + nodefactor("role_ins") + nodefactor("male")
##
## One network from which simulations are drawn was provided.
##
## Processing statistic: Dyad-wise shared partners
## Processing statistic: Edge-wise shared partners
## Processing statistic: Degree
## Processing statistic: Indegree
## Processing statistic: Geodesic distances
## Processing statistic: Tie prediction
## Processing statistic: Modularity (walktrap)
plot(ergm_3_gof)
#DLT2_Network Object
#nodes and edges
nodes2 <- read_csv("nodes_subset2.csv")
## Rows: 190 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): role, grades, location, region, country, group, gender
## dbl (13): Id, facilitator, experience2, experience, expert, connect, strong_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nodes2$male <- ifelse(nodes2$gender == 'male', 1, 0)
nodes2$role_ins <- ifelse(nodes2$role == 'instructionaltech',1,0)
edges <- read_csv("edges_subset2.csv")
## Rows: 1370 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Source, Target
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
edges <- edges %>%
filter(!(Source == Target))
edges <- unique(edges[c("Source", "Target")])
DLT2_network <- network(edges, vertex.attr = nodes, matrix.type = "edgelist", ignore.eval = FALSE)
#ERGMs_DLT2
summary(DLT2_network ~ edges + mutual) #1176 edges and 201 mutual
## edges mutual
## 1176 201
set.seed(589)
ergm_mod_2 <-ergm(DLT2_network ~ edges + mutual)
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.5239.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.1742.
## Estimating equations are not within tolerance region.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0490.
## Convergence test p-value: 0.3563. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0824.
## Convergence test p-value: 0.1820. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0103.
## Convergence test p-value: 0.0053. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_mod_2) #This shows that there are more mutual links in the DLT 2 network than what would expect
## Call:
## ergm(formula = DLT2_network ~ edges + mutual)
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.78150 0.03603 0 -104.97 <1e-04 ***
## mutual 3.12469 0.11015 0 28.37 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 49782 on 35910 degrees of freedom
## Residual Deviance: 9647 on 35908 degrees of freedom
##
## AIC: 9651 BIC: 9668 (Smaller is better. MC Std. Err. = 3.428)
#from a random network with 1176 edges.
ergm_4 <- ergm(DLT2_network ~ edges +
mutual +
nodefactor('male'))
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 1.4331.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0359.
## Convergence test p-value: 0.1534. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0527.
## Convergence test p-value: 0.0486. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0689.
## Convergence test p-value: 0.0959. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0113.
## Convergence test p-value: 0.0001. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_4)#Results suggest that there is a statistically significant difference between male and female
## Call:
## ergm(formula = DLT2_network ~ edges + mutual + nodefactor("male"))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.65305 0.04533 0 -80.586 <1e-04 ***
## mutual 3.11277 0.09993 0 31.150 <1e-04 ***
## nodefactor.male.1 -0.19373 0.04080 0 -4.748 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 49782 on 35910 degrees of freedom
## Residual Deviance: 9618 on 35907 degrees of freedom
##
## AIC: 9624 BIC: 9650 (Smaller is better. MC Std. Err. = 3.097)
#participants form discussions in the DLT 2 network. The probability for male participants of forming a discussion thread
#is around 0.2 lower than female participants.
ergm_5 <- ergm(DLT2_network ~ edges +
mutual +
nodefactor('role_ins') +
nodefactor('male'))
## Starting maximum pseudolikelihood estimation (MPLE):
## Evaluating the predictor and response matrix.
## Maximizing the pseudolikelihood.
## Finished MPLE.
## Starting Monte Carlo maximum likelihood estimation (MCMLE):
## Iteration 1 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.4537.
## Estimating equations are not within tolerance region.
## Iteration 2 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0718.
## Convergence test p-value: 0.2546. Not converged with 99% confidence; increasing sample size.
## Iteration 3 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0057.
## Convergence test p-value: 0.0183. Not converged with 99% confidence; increasing sample size.
## Iteration 4 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0085.
## Convergence test p-value: 0.0126. Not converged with 99% confidence; increasing sample size.
## Iteration 5 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0165.
## Convergence test p-value: 0.1839. Not converged with 99% confidence; increasing sample size.
## Iteration 6 of at most 60:
## Optimizing with step length 1.0000.
## The log-likelihood improved by 0.0104.
## Convergence test p-value: 0.0048. Converged with 99% confidence.
## Finished MCMLE.
## Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
## Bridging between the dyad-independent submodel and the full model...
## Setting up bridge sampling...
## Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
## Bridging finished.
## This model was fit using MCMC. To examine model diagnostics and check
## for degeneracy, use the mcmc.diagnostics() function.
summary(ergm_5)
## Call:
## ergm(formula = DLT2_network ~ edges + mutual + nodefactor("role_ins") +
## nodefactor("male"))
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.53303 0.04950 0 -71.371 <1e-04 ***
## mutual 3.08466 0.10160 0 30.361 <1e-04 ***
## nodefactor.role_ins.1 -0.30424 0.05234 0 -5.813 <1e-04 ***
## nodefactor.male.1 -0.19941 0.03878 0 -5.142 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 49782 on 35910 degrees of freedom
## Residual Deviance: 9582 on 35906 degrees of freedom
##
## AIC: 9590 BIC: 9624 (Smaller is better. MC Std. Err. = 2.431)
ergm_5_gof <- btergm::gof(ergm_5, nsim = 50)
##
## Starting GOF assessment on a single computing core....
##
## No 'target' network(s) provided. Using networks on the left-hand side of the model formula as observed networks.
##
## Simulating 50 networks from the following formula:
## DLT2_network ~ edges + mutual + nodefactor("role_ins") + nodefactor("male")
##
## One network from which simulations are drawn was provided.
##
## Processing statistic: Dyad-wise shared partners
## Processing statistic: Edge-wise shared partners
## Processing statistic: Degree
## Processing statistic: Indegree
## Processing statistic: Geodesic distances
## Processing statistic: Tie prediction
## Processing statistic: Modularity (walktrap)
plot(ergm_5_gof)
Data visualization. Based on the results from Section 3 Data Analysis, we can learn that in both cohorts, the “instrucitonaltech” group had the lower total degrees and lower betweenness and closeness compared with the other groups. The figure below shows the degree distributions among the two role groups in the two cohorts.
In addition, the ERGMs results show that, for the DLT_1 cohort, the “instrucitonaltech” role is a statistically significant predictor for participants’ online interactions which is shown in the table below.
```r
DLT_1 <- screenreg(list(ergm_2,ergm_3))
DLT_1
```
```
##
## =================================================
## Model 1 Model 2
## -------------------------------------------------
## edges -3.73 *** -3.63 ***
## (0.05) (0.05)
## mutual 2.79 *** 2.76 ***
## (0.12) (0.11)
## nodefactor.male.1 -0.00 -0.01
## (0.04) (0.04)
## nodefactor.role_ins.1 -0.23 ***
## (0.05)
## -------------------------------------------------
## AIC 8318.05 8293.23
## BIC 8343.09 8326.62
## Log Likelihood -4156.03 -4142.62
## =================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
```
As for the DLT_2 cohort, “gender” and “role” are both statistically significant predictors for participants’ online interactions
```r
DLT_2 <- screenreg(list(ergm_4,ergm_5))
DLT_2
```
```
##
## =================================================
## Model 1 Model 2
## -------------------------------------------------
## edges -3.65 *** -3.53 ***
## (0.05) (0.05)
## mutual 3.11 *** 3.08 ***
## (0.10) (0.10)
## nodefactor.male.1 -0.19 *** -0.20 ***
## (0.04) (0.04)
## nodefactor.role_ins.1 -0.30 ***
## (0.05)
## -------------------------------------------------
## AIC 9624.27 9590.07
## BIC 9649.73 9624.02
## Log Likelihood -4809.13 -4791.03
## =================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
```
Conclusion. In summary, my final project examined two guided questions which are: in what ways participants of the DLT 1 and DLT 2 networks formed interactions and 2) what participants’ attributes that significantly predicted their interactions in the DLT1 and 2 courses. In particular, this project looks at the strongest component in each cohort and further explores how “gender” and “role” relate to participants’ online interactions. For the DLT_1 cohort, the strongest component consists of 177 counts (983 edges and 137 mutual) while the DLT_2 cohort has 190 counts (1176 edges and 201 mutual). In both cohorts, the “instructionaltech” background participant(s) had the largest/second largest sizes. The network measures show that the participants were forming more mutual links in both cohorts. However, participants from the “instructionaltech” role tended to have a negative effect on the online interactions in both cohorts and the “gender” factor of being a male participant had a negative effect only in the DLT_2 cohort. According to the Goodness of Fit graphs, I can see that the current models may need to consider including some other factors to further explore the relationships between participants’ attributes on influencing their online interactions in the studied online courses.
References.
Kellogg, S., Booth, S., & Oliver, K. (2014). A social network perspective on peer supported learning in MOOCs for educators. International Review of Research in Open and Distributed Learning, 15(5), 263-289.