Data Structures & Sociograms

SNA Module 1: Code-Along

Data Intensive Research-Workflow

From Learning Analytics Goes to School (Krumm, Means, and Bienkowski 2018)

Prepare

Guiding Research & Network Packages

Guiding Study

Revisiting early work in the field of sociometry, this study by Pittinsky and Carolan (2008) assesses the level of agreement between teacher perceptions and student reports of classroom friendships among middle school students.

Behavioral vs. Cognitive Classroom Friendships (Pittinsky and Carolan 2008)

The central question guiding this investigation was:

Do student reports agree with teacher perceptions when it comes to classroom friendship ties and with what consequences for commonly used social network measures?

  • 1 teacher, 1 middle school, four classrooms

  • Students given roster and asked to evaluate relationships with peers

  • Choices included best friend, friend, know-like, know, know-dislike, strongly dislike, and do not know.

  • Relations are valued (degrees of friendship, not just yes or no)

  • Data are directed (friendship nominations were not presumed to be reciprocal).

  • Teacher’s perceptions and students’ reports were statistically similar, 11–29% of possible ties did not match.

  • Students reported significantly more reciprocated friendship ties than the teacher perceived.

  • Observed level of agreement varied across classes and generally increased over time.

Load Packages

Let’s start by creating a new Python script and loading some essential packages introduced in LA Workflows:

import pandas as pd

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

Use your Python script to import the {networkx} package as nx.

# YOUR CODE HERE
#
#

Wrangle

Intro to Network Data Structures

Network Data Structures

Consistent with typical data storage, node-lists often include:

  • identifiers like name or ID

  • demographic info (gender, age)

  • socio-economic info (job, income)

  • substantive info (grades, attendance)

id name gender achievement gender_num achievement_num
0 1 Katherine female high 1 1
1 2 James male average 0 2
2 3 Angela female average 1 2
3 4 Joseph male high 0 1
4 5 Samantha female average 1 2

Radically different than typical data storage, edge-lists include:

  • ego and an alter

  • tie strength or frequency

  • edge attributes (time, event, text)

from to weight
0 1 2 1
1 1 4 1
2 1 5 1
3 1 6 1
4 1 7 1

Also radically different, an adjacency matrix includes:

  • column for each actor

  • row for each actor

  • a value indicating the presence/strength of a relation

0 1 2 3 4
0 0 0 0 1 0
1 0 0 1 0 0
2 0 1 0 0 0
3 0 0 0 0 0
4 1 0 0 0 0

Take a look at one of the network datasets in the data folder under the Files Tab in RStudio and consider the following:

  • What format is this data set stored as?

  • If edge data, is it directed or undirected? Valued?

  • If node data, does the file contain attribute data?

  • What are some things you notice about this dataset?

  • What questions do you have about this dataset?

Import Data

Let’s start by importing two CSV files that contain data about the nodes and the edges in our student-reported friendship network:

student_nodes = pd.read_csv("lab-1/data/student-attributes.xlsx")

student_edges = pd.read_csv("lab-1/data/student-edgelist.xlsx")

Now let’s take a look at the data file we just imported using the print() function or another function of choice you may have learned previously:

print(student_edges)
print(student_nodes)

Think about the questions below and be prepared to share your response:

  1. What do you think the rows and columns in each file represent?

  2. What about the values in each cell represent?

  3. What else do you notice about the data?

  4. What questions do you have?

A Tidy Network

Run the following code in your R script:

student_network = nx.from_pandas_edgelist(student_edges, 
                                          source='from', 
                                          target='to', 
                                          create_using=nx.DiGraph())

The from_pandas_edgelist() function creates a special network data structure called a “tidy graph” that combines our nodes and edges into a single R object.

Using your Python script, print() the network object we just created and run the code to produce the output on the next tab:

# ADD CODE BELOW
#
#

You should see an output that looks something like this:

DiGraph with 26 nodes and 203 edges

Think about the questions below:

  1. What is size of the student-reported friendship network?

  2. What else do you notice about this network?

  3. What questions do have about this network summary?

Explore

Making Simple and Sophisticated Sociograms

A Simple Sociogram

Run the following code to make a simple sociogram:

nx.draw(student_network)


The draw() function is a simple function for quickly plotting graphs using the {networkx} package.

Both functions allow a small degree of customization, but are still limited.

nx.draw(student_network, with_labels=True, font_weight='bold') 

  1. In what situations might these limited functions be useful?
  2. When might they inappropriate to use?

A Sophisticated Sociogram

The ggraph() function is the first function required to build a sociogram. Try running this function on out student_network and see what happens:

{r set-graph, echo=TRUE, fig.show='hide'} ggraph(student_network)

This function serves two critical roles:

  1. It takes care of setting up the plot object for the network specified.

  2. It creates the layout based on algorithm provided.

Let’s “add” nodes to our sociogram using the + operator and the geom_node_point() function:

#| echo: true
#| fig.show: hide
#| message: false
ggraph(student_network) + 
  geom_node_point() 
#| echo: false
#| fig.asp: .75
ggraph(student_network) + 
  geom_node_point() 

Both functions allow a small degree of customization, but are still limited.

#| echo: true
#| fig.show: hide
ggraph(student_network) + 
  geom_node_point() + 
  geom_edge_link()
#| echo: false
#| fig.asp: .75
ggraph(student_network) + 
  geom_node_point() + 
  geom_edge_link()



The {ggraph} packages allows for some very fairly sophisticated sociograms…

#| echo: false
#| fig.asp: .75
ggraph(student_network, layout = "stress") + 
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 start_cap = circle(3, 'mm'),
                 alpha = .1) +
  geom_node_point(aes(size = local_size(),
                      color = gender)) +
  geom_node_text(aes(label = id),
                 repel=TRUE) +
  theme_graph()

With a fair bit of coding:

#| echo: true
#| fig.show: hide
ggraph(student_network, layout = "stress") + 
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 start_cap = circle(3, 'mm'),
                 alpha = .1) +
  geom_node_point(aes(size = local_size(),
                      color = gender)) +
  geom_node_text(aes(label = id),
                 repel=TRUE) +
  theme_graph()

What’s Next?

Acknowledgements

This work was supported by the National Science Foundation grants DRL-2025090 and DRL-2321128 (ECR:BCSER). Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

Krumm, Andrew, Barbara Means, and Marie Bienkowski. 2018. Learning Analytics Goes to School. Routledge. https://doi.org/10.4324/9781315650722.
Pittinsky, Matthew, and Brian V Carolan. 2008. “Behavioral Versus Cognitive Classroom Friendship Networks: Do Teacher Perceptions Agree with Student Reports?” Social Psychology of Education 11: 133–47. https://link.springer.com/content/pdf/10.1007/s11218-007-9046-7.pdf.