SNA Module 1: Code-Along
From Learning Analytics Goes to School (Krumm, Means, and Bienkowski 2018)
Guiding Research & Network Packages
Revisiting early work in the field of sociometry, this study by Pittinsky and Carolan (2008) assesses the level of agreement between teacher perceptions and student reports of classroom friendships among middle school students.
The central question guiding this investigation was:
Do student reports agree with teacher perceptions when it comes to classroom friendship ties and with what consequences for commonly used social network measures?
1 teacher, 1 middle school, four classrooms
Students given roster and asked to evaluate relationships with peers
Choices included best friend, friend, know-like, know, know-dislike, strongly dislike, and do not know.
Relations are valued (degrees of friendship, not just yes or no)
Data are directed (friendship nominations were not presumed to be reciprocal).
Teacher’s perceptions and students’ reports were statistically similar, 11–29% of possible ties did not match.
Students reported significantly more reciprocated friendship ties than the teacher perceived.
Observed level of agreement varied across classes and generally increased over time.
Let’s start by creating a new Python script and loading some essential packages introduced in LA Workflows:
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
Intro to Network Data Structures
Consistent with typical data storage, node-lists often include:
identifiers like name or ID
demographic info (gender, age)
socio-economic info (job, income)
substantive info (grades, attendance)
| id | name | gender | achievement | gender_num | achievement_num | |
|---|---|---|---|---|---|---|
| 0 | 1 | Katherine | female | high | 1 | 1 |
| 1 | 2 | James | male | average | 0 | 2 |
| 2 | 3 | Angela | female | average | 1 | 2 |
| 3 | 4 | Joseph | male | high | 0 | 1 |
| 4 | 5 | Samantha | female | average | 1 | 2 |
Radically different than typical data storage, edge-lists include:
ego and an alter
tie strength or frequency
edge attributes (time, event, text)
| from | to | weight | |
|---|---|---|---|
| 0 | 1 | 2 | 1 |
| 1 | 1 | 4 | 1 |
| 2 | 1 | 5 | 1 |
| 3 | 1 | 6 | 1 |
| 4 | 1 | 7 | 1 |
Also radically different, an adjacency matrix includes:
column for each actor
row for each actor
a value indicating the presence/strength of a relation
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 1 | 0 | 0 |
| 2 | 0 | 1 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 | 0 | 0 |
| 4 | 1 | 0 | 0 | 0 | 0 |
Take a look at one of the network datasets in the data folder under the Files Tab in RStudio and consider the following:
What format is this data set stored as?
If edge data, is it directed or undirected? Valued?
If node data, does the file contain attribute data?
What are some things you notice about this dataset?
What questions do you have about this dataset?
Let’s start by importing two CSV files that contain data about the nodes and the edges in our student-reported friendship network:
Now let’s take a look at the data file we just imported using the print() function or another function of choice you may have learned previously:
Think about the questions below and be prepared to share your response:
What do you think the rows and columns in each file represent?
What about the values in each cell represent?
What else do you notice about the data?
What questions do you have?
Run the following code in your R script:
The from_pandas_edgelist() function creates a special network data structure called a “tidy graph” that combines our nodes and edges into a single R object.
Using your Python script, print() the network object we just created and run the code to produce the output on the next tab:
You should see an output that looks something like this:
DiGraph with 26 nodes and 203 edges
Think about the questions below:
What is size of the student-reported friendship network?
What else do you notice about this network?
What questions do have about this network summary?
Making Simple and Sophisticated Sociograms
The ggraph() function is the first function required to build a sociogram. Try running this function on out student_network and see what happens:
{r set-graph, echo=TRUE, fig.show='hide'} ggraph(student_network)
This function serves two critical roles:
It takes care of setting up the plot object for the network specified.
It creates the layout based on algorithm provided.
The {ggraph} packages allows for some very fairly sophisticated sociograms…
#| echo: false
#| fig.asp: .75
ggraph(student_network, layout = "stress") +
geom_edge_link(arrow = arrow(length = unit(1, 'mm')),
end_cap = circle(3, 'mm'),
start_cap = circle(3, 'mm'),
alpha = .1) +
geom_node_point(aes(size = local_size(),
color = gender)) +
geom_node_text(aes(label = id),
repel=TRUE) +
theme_graph()With a fair bit of coding:
#| echo: true
#| fig.show: hide
ggraph(student_network, layout = "stress") +
geom_edge_link(arrow = arrow(length = unit(1, 'mm')),
end_cap = circle(3, 'mm'),
start_cap = circle(3, 'mm'),
alpha = .1) +
geom_node_point(aes(size = local_size(),
color = gender)) +
geom_node_text(aes(label = id),
repel=TRUE) +
theme_graph()SNA Case Study: Who’s Friends with Who in Middle School?
Guiding Study: Behavioral versus cognitive classroom friendship networks.
This work was supported by the National Science Foundation grants DRL-2025090 and DRL-2321128 (ECR:BCSER). Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.