Prepare

Introduction

Hey folks! I decided to take one of the other “default” datasets for this analysis: Pittinsky’s Middle School Science Classroom Friendship Nominations. As someone who has taught in a classroom before (albeit at the college level), I was interested in the student vs. teacher perspective on a small social network.

My research question was to identify any notable group-level differences between the teacher’s concept of her students’ social network versus the students self-reported network.

Install Packages

install.packages(c("tidyverse","ggraph","tidygraph","igraph","readxl"),repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/lj/3gxxzjkj4tg_hkk4dm99lpww0000gn/T//RtmpNrpxPX/downloaded_packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggraph)
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(tidygraph)
## 
## Attaching package: 'tidygraph'
## The following object is masked from 'package:igraph':
## 
##     groups
## The following object is masked from 'package:stats':
## 
##     filter
library(readxl)

Wrangle

Import & Tidy Data

I first imported the datasets from the student and teacher .csv files, renaming the first column of each so that they could be turned into matrices. They were then ready to be converted into graphable objects student_network and teacher_network.

student_data <- read_xlsx("data/Peer Groups Data Chapter 3_a.xlsx")
teacher_data <- read_xlsx("data/Peer Groups Data Chapter 3_c.xlsx")

rownames(student_data) <- 1:27
## Warning: Setting row names on a tibble is deprecated.
rownames(teacher_data) <- 1:27
## Warning: Setting row names on a tibble is deprecated.
student_matrix <- as.matrix(student_data)
teacher_matrix <- as.matrix(teacher_data) 

student_network <- as_tbl_graph(student_matrix, directed = TRUE)
teacher_network <- as_tbl_graph(teacher_matrix, directed = TRUE)

Explore

Components

I wanted to take a quick look at each network to see if anything stood out.

autograph(student_network)

This was a mostly connected class with one student isolate. But how did the teacher see things?

autograph(teacher_network)

This was considerably different! There were several components from the teacher’s viewpoint. These networks came from directed data, which meant they could be parsed into components according to weak (one direction) versus strong (both directions) ties. autograph just looks at weak ties, so I went further into a component analysis. Just to corroborate with autograph, I ran it for weak ties first:

components(student_network, mode = "weak")
## $membership
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
##  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  2  1  1  1  1  1  1  1 
## 27 
##  1 
## 
## $csize
## [1] 26  1
## 
## $no
## [1] 2
components(teacher_network, mode = "weak")
## $membership
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
##  1  1  1  1  1  1  1  1  1  1  1  1  1  2  3  1  2  4  5  6  1  1  1  3  4  1 
## 27 
##  1 
## 
## $csize
## [1] 19  2  2  2  1  1
## 
## $no
## [1] 6

These printouts showed the same components and component members as the autograph visualizations.

The strong components might reveal something different, though.

components(student_network, mode = "strong")
## $membership
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
##  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  1  2  2  2  2  2  2  2 
## 27 
##  2 
## 
## $csize
## [1]  1 26
## 
## $no
## [1] 2

The students’ self reported network didn’t appear to be much different, which actually surprised me. However the same student who didn’t have a weak tie, unsurprisingly, also didn’t have a strong tie, so at least that was constant.

components(teacher_network, mode = "strong")
## $membership
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
##  9 10 10 10  9  8  9 10  7 10  7  9  9  6  5 10  6  4  3  2  1 10  7  5  4 10 
## 27 
##  9 
## 
## $csize
##  [1] 1 1 1 2 2 2 3 1 6 8
## 
## $no
## [1] 10

The teacher’s network, however, changed, going from six to ten components. Each strong component was smaller than the previous weak ones, implying that each component was broken up at least once (a later ggraph visualization will highlight these differences).

Cliques

I then looked to see how many completely connected subgroups, known as cliques, were in each network. Note that cliques ignore directionality, so these are simply groups where all nodes are connected to one another, but not necessarily mutually.

clique_num(student_network)
## Warning in clique_num(student_network): At cliques.c:1125 :directionality of
## edges is ignored for directed graphs
## [1] 9

The student network showed many more cliques than components. I found it interesting, and possibly even contradictory, to the components description of student_network’s strong components.

clique_num(teacher_network)
## Warning in clique_num(teacher_network): At cliques.c:1125 :directionality of
## edges is ignored for directed graphs
## [1] 6

This seemed more on-brand for how the teacher imagined their students’ network.

I also looked to see what the maximum size of a given clique would be. After some trial and error, I found that the students reported cliques no larger than nine, and the teacher identified cliques no larger than six:

cliques(student_network, min = 9, max = NULL)
## Warning in cliques(student_network, min = 9, max = NULL): At
## igraph_cliquer.c:57 :Edge directions are ignored for clique calculations
## [[1]]
## + 9/27 vertices, named, from 502e385:
## [1] 1  4  5  7  8  10 11 21 27
cliques(teacher_network, min = 6, max = NULL)
## Warning in cliques(teacher_network, min = 6, max = NULL): At
## igraph_cliquer.c:57 :Edge directions are ignored for clique calculations
## [[1]]
## + 6/27 vertices, named, from f7fb3be:
## [1] 2  3  8  10 16 22

Edge Betweenness

Just to get the practice in, I also identified the edge betweenness for both datasets, and viewed the clusters by size in descending order.

student_network |>
  morph(to_undirected) |>
  activate(nodes) |>
  mutate(sub_group = group_edge_betweenness()) |>
  unmorph() |>
  activate(nodes) |>
  as_tibble() |>
  group_by(sub_group) |>
  summarise(count = n()) |>
  arrange(desc(count))
## # A tibble: 5 × 2
##   sub_group count
##       <int> <int>
## 1         1    22
## 2         2     2
## 3         3     1
## 4         4     1
## 5         5     1
teacher_network |>
  morph(to_undirected) |>
  activate(nodes) |>
  mutate(sub_group = group_edge_betweenness()) |>
  unmorph() |>
  activate(nodes) |>
  as_tibble() |>
  group_by(sub_group) |>
  summarise(count = n()) |>
  arrange(desc(count))
## # A tibble: 9 × 2
##   sub_group count
##       <int> <int>
## 1         1     7
## 2         2     6
## 3         3     4
## 4         4     2
## 5         5     2
## 6         6     2
## 7         7     2
## 8         8     1
## 9         9     1

Corroborating the components and cliques results, the teacher has conceptualized more distinct clusters of students in the classroom network.

Communicate

To more succinctly show the difference between the strong and weak components, I used some previous code to identify and highlight edges that had the edge_is_mutual property, and plotted the entire network again using ggraph:

student_network |>
  activate(edges) |>
  mutate(reciprocated = edge_is_mutual()) |>
ggraph() +
  geom_node_point() +
  geom_edge_link(aes(color = reciprocated)) +
  ggtitle("Student Network") +
  theme_graph()
## Using `stress` as default layout

teacher_network |>
  activate(edges) |>
  mutate(reciprocated = edge_is_mutual()) |>
ggraph() +
  geom_node_point() +
  geom_edge_link(aes(color = reciprocated)) +
  ggtitle("Teacher Network") +
  theme_graph()
## Using `stress` as default layout

Discussion

The above visualizations, which color coded to reflect the strong connections between students, and by association the components found in our analysis, highlighted some key distinctions between how the teacher perceived the students versus how the students perceived one another as friends within the classroom. It seems that the teacher identified several dyads that was not reflected by the students nominations; in fact, most of the students were at least weakly connected with one another in a component of 26, with student 19 as the only one not included.

One other observation is that the teacher seemed to give out more strong ties between students. This could be because they didn’t want to assume all but the most obviously unidirectional ties. If two students were seen a lot together, then they must be mutual friends. I’d say that the burden of proof falls more heavily on the students reporting their relationships.

As you saw in my process, there was an enduring inconsistency between the components analysis of student_network and the other analyses. You can see it particularly clearly in the above graphs; if the sociogram reflected the original components printout, then the entire large component should be blue. Instead, we have what looks like four strong components: The original isolate that has no connections of any kind, an additional isolate that only has weak connections, a dyad, and the remaining component. This is what the above graph for student_network looks like when weak ties were filtered out:

student_network |>
  activate(edges) |>
  mutate(reciprocated = edge_is_mutual()) |>
  filter(reciprocated == TRUE) |>
ggraph() +
  geom_node_point() +
  geom_edge_link(aes(color = reciprocated)) +
  ggtitle("Student Network") +
  theme_graph()
## Using `stress` as default layout

I used some old code that simply visualized reciprocated connections to build this visualization, rather than pull straight from components itself, but I can’t think of why this would show anything differently. If components were true, shouldn’t there be at least one other strong tie connecting this dyad and weak isolate to the rest of the larger strong component? I ran this over with Dr. Kellogg and we were still stumped after looking through it. I even looked at the original data to see that there were at least some weak ties present, and there are. For whatever reason, components says otherwise.

Anyways, this was a cool look at how, at least in this case, the teacher may have had an overly-simplistic view of friend groups and cliques. Conversely, perhaps some students think that they are more friends with one another than they actually are, based on how other people view their interactions.