The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts: 

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies social network analysis to an educational context or topic of interest. More specifically, locate a network study that makes use of sociograms to visualize relational data. You are also welcome to select one of the research papers listed in the essential readings that may have piqued your interest.

  1. Provide an APA citation for your selected study.

    • Dobbie, F., Reith, G., & McConville, S. (2018). Utilising social network research in the qualitative exploration of gamblers’social relationships. Qualitative Research, 18(2), 207–223.
  2. Who are the network’s actors and how are they represented visually?

    • The study used egocentric sociograms with the gamblers as the actors. As part of qualtitative interviews, reserachers directed the participant (the actor) to create a sociogram with the actor at the center in a circle and then larger circles encompassing the center actor. Closest friends and family in the first ring.
  3. What ties connect these actors and how are they represented visually?

    • The actors are connected to others with three different colored dots for three different types of relationships they have with people in their lives. Red dots for people affrected by their gambling; green for people who helped them overcome a gambling problem, and yellow for people who weren’t aware of their gambling problem.
  4. Why were these relations of interest to the researcher?

    • The authors wanted to do social network research, and technically didn’t do social network analysis as they did not identify measures of the relationships. They were interested in exploring the use of using a sociogram as a way of collecting data that would provide a visualization.
  5. Finally, what makes this collection of actors a social network?

    • This study describes egocentric social networks, the actors are not connected to each other, rather each actor is presented with their social network of people and data for their relationship with that person around their gambling situation.

Draft a research question for a population you may be interested in studying, or that would be of interest to educational researchers, and that would require the collection of relational data and answer the following questions:

  1. What relational data would need to be collected?

    • Background data for learners including the institution of their undergraduate degree, degree displine, number of credits in STEM areas, lessons attended (our data), work completed (our data), live instructor(s), Diagnostic MCAT score, Full Length MCAT scores
  2. For what reason would relational data need to be collected in order to address this question?

    • I want to explore connections and influences with learners when they first engage with Blueprint Prep, and the paths they take with us and their MCAT outcome. First engagement with Blueprint Prep: where they are coming from (state, institutions), MCAT diagnostic scores (interest in variations by subscores), and courses the attend, instructors they have, and activities they complete as part of their MCAT preparation.
  3. Explain the analytical level at which these data would need to be collected and analyzed.

    • Their background information needs to be collected via intake questionnaire when they first enroll in an MCAT prep course. The other data come from our learning management system. Retrieval of these data will come from the data engineers.
  4. How does this differ from the ways in which individual or group behavior is typically conceptualized and modeled in conventional educational research?

    • Right now we are only able to look at descriptive data for learners and variables more singularly. This will allow us to look at how things may be connected and influences around attendance, assignment completion, and attrition and how they may relate to increases in learning in the content for each section learners study for the MCAT - bio/biochem, chemistry and physics, psych and social, and critical analysis and reasoning - CARS.

Part II: Data Product

Using one of the data sets provided in your data folder, your goal for this lab is to create a polished sociogram that visually represents this network. For example, you may be interested in examining how shared characteristics among school leaders might help explain tie formation, such as gender, level of trust in colleagues, or whether they work at the school or district level.

Alternatively, you may use your own data set to estimate models akin to those we estimated in the guided practice. 

I highly recommend creating a new R script in your lab-1 folder to complete this task. When your code is ready to share, use the code chunk below to share the final code for your model and answer the questions that follow.

# YOUR FINAL CODE HERE

library(readxl)
library(tidygraph)
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:stats':
## 
##     filter
library(ggraph)
## Loading required package: ggplot2
library(ggplot2)

library(readr)
dlt1_edges <- read_csv("data/dlt1-edges.csv")
## Rows: 2529 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): Timestamp, Discussion Title, Discussion Category, Parent Category, ...
## dbl (3): Sender, Receiver, Comment ID
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dlt1_edges
## # A tibble: 2,529 × 10
##    Sender Receiver Timestamp  `Discussion Ti…` `Discussion Ca…` `Parent Catego…`
##     <dbl>    <dbl> <chr>      <chr>            <chr>            <chr>           
##  1    360      444 4/4/13 16… Most important … Group N          Units 1-3 Discu…
##  2    356      444 4/4/13 18… Most important … Group D-L        Units 1-3 Discu…
##  3    356      444 4/4/13 18… DLT Resources—C… Group D-L        Units 1-3 Discu…
##  4    344      444 4/4/13 18… Most important … Group O-T        Units 1-3 Discu…
##  5    392      444 4/4/13 19… Most important … Group U-Z        Units 1-3 Discu…
##  6    219      444 4/4/13 19… Most important … Group M          Units 1-3 Discu…
##  7    318      444 4/4/13 19… Most important … Group M          Units 1-3 Discu…
##  8      4      444 4/4/13 19… Most important … Group N          Units 1-3 Discu…
##  9    355      356 4/4/13 20… DLT Resources—C… Group D-L        Units 1-3 Discu…
## 10    355      444 4/4/13 20… Most important … Group D-L        Units 1-3 Discu…
## # … with 2,519 more rows, and 4 more variables: `Category Text` <chr>,
## #   `Discussion Identifier` <chr>, `Comment ID` <dbl>, `Discussion ID` <chr>
library(readr)
dlt1_nodes <- read_csv("data/dlt1-nodes.csv")
## Rows: 445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): role1, experience2, grades, location, region, country, group, gend...
## dbl  (3): UID, Facilitator, experience
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dlt1_nodes
## # A tibble: 445 × 13
##      UID Facilitator role1 experience experience2 grades location region country
##    <dbl>       <dbl> <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
##  1     1           0 libm…          1 6 to 10     secon… VA       South  US     
##  2     2           0 clas…          1 6 to 10     secon… FL       South  US     
##  3     3           0 dist…          2 11 to 20    gener… PA       North… US     
##  4     4           0 clas…          2 11 to 20    middle NC       South  US     
##  5     5           0 othe…          3 20+         gener… AL       South  US     
##  6     6           0 clas…          1 4 to 5      gener… AL       South  US     
##  7     7           0 inst…          2 11 to 20    gener… SD       Midwe… US     
##  8     8           0 spec…          1 6 to 10     secon… BE       Inter… BE     
##  9     9           0 clas…          1 6 to 10     middle NC       South  US     
## 10    10           0 scho…          2 11 to 20    middle NC       South  US     
## # … with 435 more rows, and 4 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>
dlt1_network <- tbl_graph(edges = dlt1_edges,
                             nodes = dlt1_nodes, 
                             directed = TRUE)
dlt1_network
## # A tbl_graph: 445 nodes and 2529 edges
## #
## # A directed multigraph with 4 components
## #
## # Node Data: 445 × 13 (active)
##     UID Facilitator role1 experience experience2 grades location region country
##   <dbl>       <dbl> <chr>      <dbl> <chr>       <chr>  <chr>    <chr>  <chr>  
## 1     1           0 libm…          1 6 to 10     secon… VA       South  US     
## 2     2           0 clas…          1 6 to 10     secon… FL       South  US     
## 3     3           0 dist…          2 11 to 20    gener… PA       North… US     
## 4     4           0 clas…          2 11 to 20    middle NC       South  US     
## 5     5           0 othe…          3 20+         gener… AL       South  US     
## 6     6           0 clas…          1 4 to 5      gener… AL       South  US     
## # … with 439 more rows, and 4 more variables: group <chr>, gender <chr>,
## #   expert <chr>, connect <chr>
## #
## # Edge Data: 2,529 × 10
##    from    to Timestamp `Discussion Ti…` `Discussion Ca…` `Parent Catego…`
##   <int> <int> <chr>     <chr>            <chr>            <chr>           
## 1   360   444 4/4/13 1… Most important … Group N          Units 1-3 Discu…
## 2   356   444 4/4/13 1… Most important … Group D-L        Units 1-3 Discu…
## 3   356   444 4/4/13 1… DLT Resources—C… Group D-L        Units 1-3 Discu…
## # … with 2,526 more rows, and 4 more variables: `Category Text` <chr>,
## #   `Discussion Identifier` <chr>, `Comment ID` <dbl>, `Discussion ID` <chr>
autograph(dlt1_network)

ggraph(dlt1_network, layout = "fr") +
  geom_node_point() + geom_edge_link() + theme_graph() 

ggraph(dlt1_network, layout = "fr") + 
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 start_cap = circle(3, 'mm'),
                 alpha = .1) +
  geom_node_point(aes(color = region)) +
  geom_node_text(aes(label = UID),
                 repel=TRUE) +  theme_graph()
## Warning: ggrepel: 411 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Knit & Submit

Congratulations, you’ve completed your Intro to SNA Badge! Complete the following steps to submit your work for review:

  1. Change the name of the author: in the YAML header at the very top of this document to your name. As noted in Reproducible Research in R, The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Click the yarn icon above to “knit” your data product to a HTML file that will be saved in your R Project folder.

  3. Commit your changes in GitHub Desktop and push them to your online GitHub repository.

  4. Publish your HTML page the web using one of the following publishing methods:

    • Publish on RPubs by clicking the “Publish” button located in the Viewer Pane when you knit your document. Note, you will need to quickly create a RPubs account.

    • Publishing on GitHub using either GitHub Pages or the HTML previewer.

  5. Post a new discussion on GitHub to our SNA Badges forum. In your post, include a link to your published web page and a short reflection highlighting one thing you learned from this lab and one thing you’d like to explore further.