# Not all libraries are included in this starter chunk, as their order in the code
# is significant, since the different network packages use the same functions to create + plot networks.
library(knitr)
library(tidyverse)
library(jtools)

Abstract

This post aims to understand how homophily, the tendency to flock toward similar others, functions in Introductory STEM courses at Reed College. Utilizing data collected through observations by the author (who works as a course assistant in both classes), homophily is visualized, measured, and modeled using the igraph, networkD3, statnet, and ergm packages. In particular, the author mobilizes the impact of gender homophily, hoping to understand who students choose to collaborate with and the role gender plays in this selection. Extensive literature on the STEM field details how gender minorities (referred to in this report as those with a “marked” gender identity, in contrast to an “unmarked” man) feel less belonging in the STEM classroom, leading to less gender minorities majoring in STEM fields. One contributing cause to this is a smaller number of professors with a marked gender, as well as fewer classmates with a marked gender. Therefore, previous literature supports the finding for homophily to emerge in classrooms, with those of marked genders tending to work together to increase their comfortability in an otherwise hostile environment. This hypothesis is partially supported by the data, with both chemistry networks showing high assortativity coefficients and significant homophily compared to randomly generated networks, though the statistics group project network does not. Such results, in conjunction with the participation data in Part I, raise many further research questions about the impact of gender in introductory STEM courses, while providing insight into the different environments of chemistry and statistics at Reed College.

Research Questions and Hypotheses

While there is not research into the specific impacts of homophily on student networks in introductory STEM courses, this section draws on a relevant portion of the literature: research supporting how individuals with privilege benefit the most from their peer interactions. These students join study groups and social networks with other skilled students, allowing them to informally expand their knowledge while those without privilege and access to these networks are left behind (Sacerdote 2011; DiMaggio & Garip 2012). I argue that those with an unmarked gender presentation (i.e., those who present as cisgender men) will follow this same pattern, forming networks primarily with one another—likely increasing their benefits, though the data does not capture any such benefits. By utilizing three different networks and comparing them to randomly generated networks by gender affiliation, we are able to tell if homophily is present.

In short, I investigate the following:

  • Is there homophily present in any of the three classroom networks? If so, which ones? Is homophily proven significant when randomly generated networks are made? What further relationships can be gleaned about the presence of lack of homophily in network visualizations?
    • Hypothesis 1: There will be gender homophily present and significant in all three networks, though the randomness of the Math 141 network for those who did not choose their group may impact its results.
    • Hypothesis 2: Network visualizations will exemplify homophily, with most groups being made up entirely or primarily by one gender presentation.

Data and Methods

The data comes from a Chemistry 101 (Molecular Structure and Properties) lecture and Math 141 (Introduction to Probability and Statistics) lab, both of which the author is a course assistant/embedded tutor for. While the Chemistry 101 class is in a lecture format, the instructor has on occasion given students the ability to work in small groups (selected by the students themselves) for an extended period of time (usually 30+ minutes) to complete a worksheet or that week’s problem set together, a decision that is part of the predict-observe-explain teaching style. The two chemistry networks in this report are from a couple of these work periods when the author was in class and able to record the marked or unmarked member of each group, obtained as they facilitated and walked around the lecture hall to answer any questions. As a result of working, taking down data, and making conversation the author acknowledges the possible error in data collection, and the value of data collection of an outside observer who would not have had to juggle all of these tasks. However, my position as a course assistant means that I have tutored and met many of the students individually, thus increasing my coding accuracy of individuals’ gender. Both of these consequences of participant observation should be acknowledged.

For the first chemistry small group discussion on September 15th, there are 54 nodes (one for each student present at lecture that day), and 87 edges. Most students were working in groups of four, though there was one group of three and three groups of five. There are no loops, directed ties, or weighted ties. All multiple ties were rid of using the simplify function in igraph. The mean degree score is 3.222, 0.0004 for closeness, and zero for betweenness. There were 40 students with marked gender presentations, and 14 with unmarked gender presentations.

Considering the second chemistry small group discussion on September 20th, there were 48 nodes (as less students attended class that day, with those six students likely choosing to skip the designated work day). Group sizes were much more variable on this day, with many groups of three, one group of four, many pairs, and a couple of independent workers. No loops, directed ties, or weighted ties are included in the dataset. All multiple ties were eliminated using the simplify function in igraph. Here, the average degree score was 1.75, with 0.0005 closeness, and zero for betweenness. Interestingly, there were 39 people of marked genders and just 9 of unmarked genders. True to the pattern the chemistry professor and I have observed time and time again throughout the semester, people with unmarked genders have skipped class and been the most disrespectful, citing the curriculum and class format as unhelpful. There is much more to explore on this topic in future research, considering the reaction of those with privilege being put into environments where their past privilege and knowledge is no longer rewarded.

Finally, the third network comes from the semester-long group projects all Math 141 students must complete. The students had the opportunity to select their groups in the first few weeks of the semester, and those who did not form a group were randomly sorted into groups by the professor. I obtained the group project assignments from the class website. Every group had just four members, with the exception of one group of five. While the author did not work in the lab period for all of the students listed in the groups (there are four total lab sections for each of the two lecture sections and I only TA for two of the four), and thus did not know all of the students, they used the campus directory to check each students’ pronouns to make the gender (marked vs. unmarked) categorization. No loops, directed ties, or weighted ties were coded. The multiple ties were eliminated using the simplify function in igraph. The average degree score was 3.111, the closeness score was 0.0005, and the betweenness score was zero. More balanced than the chemistry class, there are 24 students with a marked gender identity, and 21 with an unmarked gender identity.

See Table 1 to compare these values across networks.

chem1_vertex_df <- data.frame(
  student_id = as.double(1:54),
  gender = c("marked", "marked", "marked", "unmarked", "marked", "marked", "marked", "marked", "unmarked",
             "unmarked", "unmarked", "marked", "marked", "marked", "marked", "marked", "marked", "marked",
             "marked", "marked", "marked", "marked", "marked", "marked", "marked","unmarked", "marked", 
             "marked", "unmarked", "unmarked", "marked", "marked", "marked", "marked", "unmarked", 
             "unmarked", "unmarked",  "unmarked", "unmarked", "unmarked", "marked", "marked", "marked", 
             "marked", "marked", "marked", "marked", "marked", "marked", "marked", "marked", "marked", 
             "marked", "unmarked"),
  stringsAsFactors = FALSE
)

chem2_vertex_df <- data.frame(
  student_id = as.double(1:48),
  gender = c("marked", "marked", "unmarked", "marked", "marked", "marked", "marked", "marked", 
             "marked", "marked", "marked", "marked", "marked", "marked", "marked", "marked", "marked",
             "marked", "marked", "marked", "unmarked", "marked", "marked", "marked", "marked", "marked",
             "marked", "unmarked", "marked", "unmarked", "marked", "marked", "unmarked", "unmarked", 
             "unmarked", "unmarked", "unmarked", "marked", "marked", "marked", "marked", "marked",
             "marked", "marked", "marked", "marked", "marked", "marked"),
  stringsAsFactors = FALSE
)

stats_vertex_df <- data.frame(
  student_id = as.double(1:45),
  gender = c("unmarked", "unmarked", "unmarked", "marked", "marked", "unmarked", "unmarked", "unmarked",
              "unmarked", "marked", "unmarked", "unmarked", "unmarked", "marked", "marked", "unmarked", "marked",
              "unmarked", "marked", "unmarked", "marked", "marked", "marked", "marked", "marked",
              "marked", "unmarked", "marked", "marked", "marked", "marked", "unmarked", "marked",
              "unmarked", "unmarked", "marked", "marked", "unmarked", "marked", "marked", "marked",
              "marked", "unmarked", "unmarked", "unmarked"),
  stringsAsFactors = FALSE
)
# Yes, this data entry part was as sad as it looks. Even more so when I realized the way I coded it included multiple ties :(
chem1_edge_df <- data.frame(
  from = as.double(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9,
           10, 10, 11, 11, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 
           17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 
           24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31,
           32, 32, 32, 33, 33, 33, 34, 34, 34, 35, 35, 35, 36, 36, 36, 37, 37, 37, 38, 38, 38, 39, 39, 39,
           40, 40, 40, 41, 41, 41, 42, 42, 42, 43, 43, 43, 44, 44, 44, 45, 45, 45, 46, 46, 46, 47, 47, 47, 48,
           48, 48, 49, 49, 49, 50, 50, 50, 51, 51, 51, 52, 52, 52, 53, 53, 53, 54, 54, 54)),
  to = as.double(c(2, 3, 4, 1, 3, 4, 1, 2, 4, 1, 2, 3, 6, 7, 8, 9, 5, 7, 8, 9, 5, 6, 8, 9, 5, 6, 7, 9, 5, 6, 7, 8,
         11, 12, 10, 12, 10, 11, 14, 15, 16, 17, 13, 15, 16, 17, 13, 14, 16, 17, 13, 14, 15, 17, 13, 14, 15, 16,
         19, 20, 21, 18, 20, 21, 18, 19, 21, 18, 19, 20, 23, 24, 25, 26, 22, 24, 25, 26, 22, 23, 25, 26, 22, 23,
         24, 26, 22, 23, 24, 25, 28, 29, 30, 27, 29, 30, 27, 28, 30, 27, 28, 29, 32, 33, 34, 31, 33, 34, 31,
         32, 34, 31, 32, 33, 36, 37, 38, 35, 37, 38, 35, 36, 38, 35, 36, 37, 40, 41, 42, 39, 41, 42, 39, 40,
         42, 39, 40, 41, 44, 45, 46, 43, 45, 46, 43, 44, 46, 43, 44, 45, 48, 49, 50, 47, 49, 50, 47, 48, 50,
         47, 48, 49, 52, 53, 54, 51, 53, 54, 51, 52, 54, 51, 52, 53)),
  stringsAsFactors = FALSE
)

chem2_edge_df <- data.frame(
  from = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 13, 
           14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 
           26, 26, 27, 27, 28, 28, 29, 30, 31, 32, 33, 34, 35, 35, 36, 36, 37, 37, 38, 39, 40, 41, 42, 42,
           43, 43, 44, 44, 47, 48),
  to = c(2, 3, 1, 3, 1, 2, 5, 6, 4, 6, 4, 5, 8, 9, 10, 7, 9, 10, 7, 8, 10, 7, 8, 9, 12, 13, 11, 13, 11, 12, 
         15, 16, 14, 16, 14, 15, 18, 19, 17, 19, 17, 18, 21, 22, 20, 22, 20, 21, 24, 25, 23, 25, 23, 24, 27, 
         28, 26, 28, 26, 27, 30, 29, 32, 31, 34, 33, 36, 37, 35, 37, 35, 36, 39, 38, 41, 40, 43, 44, 42, 44,
         42, 43, 48, 47),
  stringsAsFactors = FALSE
)

stats_edge_df <- data.frame(
  from = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9,
           10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 16, 17, 17, 17,
           18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23, 23, 23, 24, 24, 24, 25, 25, 25,
           26, 26, 26, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31, 32, 32, 32, 33, 33, 33,
           34, 34, 34, 35, 35, 35, 36, 36, 36, 37, 37, 37, 38, 38, 38, 39, 39, 39, 40, 40, 40, 41, 41, 41,
           42, 42, 42, 43, 43, 43, 44, 44, 44, 45, 45, 45),
  to = c(2, 3, 4, 5, 1, 3, 4, 5, 1, 2, 4, 5, 1, 2, 3, 5, 1, 2, 3, 4, 7, 8, 9, 6, 8, 9, 6, 7, 9, 6, 7, 8, 
         11, 12, 13, 10, 12, 13, 10, 11, 13, 10, 11, 12, 15, 16, 17, 14, 16, 17, 14, 15, 17, 14, 15, 16,
         19, 20, 21, 18, 20, 21, 18, 19, 21, 18, 19, 20, 23, 24, 25, 22, 24, 25, 22, 23, 25, 22, 23, 24,
         27, 28, 29, 26, 28, 29, 26, 27, 29, 26, 27, 28, 31, 32, 33, 30, 32, 33, 30, 31, 33, 30, 31, 32,
         35, 36, 37, 34, 36, 37, 34, 35, 37, 34, 35, 36, 39, 40, 41, 38, 40, 41, 38, 39, 41, 38, 39, 40,
         43, 44, 45, 42, 44, 45, 42, 43, 45, 42, 43, 44),
  stringsAsFactors = FALSE
)
library(igraph)

chem1_net <- graph_from_data_frame(d=chem1_edge_df, vertices = chem1_vertex_df, directed = F)
chem1_net <- simplify(chem1_net, remove.multiple = T, remove.loops = T)

chem2_net <- graph_from_data_frame(d=chem2_edge_df, vertices = chem2_vertex_df, directed = F)
chem2_net <- simplify(chem2_net, remove.multiple = T, remove.loops = T)

stats_net <- graph_from_data_frame(d=stats_edge_df, vertices = stats_vertex_df, directed = F)
stats_net <- simplify(stats_net, remove.multiple = T, remove.loops = T)
sum_stat <- data.frame(network = c("Chemistry 9/15", "Chemistry 9/20", "Statistics"),
                       n_nodes = c(length(V(chem1_net)), length(V(chem2_net)), length(V(stats_net))),
                       n_edges = c(length(E(chem1_net)), length(E(chem2_net)), length(E(stats_net))),
                       degree = c(mean(degree(chem1_net)), mean(degree(chem2_net)), mean(degree(stats_net))),
                       closeness = c(mean(closeness(chem1_net)), mean(closeness(chem2_net)), mean(closeness(stats_net))),
                       betweenness = c(mean(betweenness(chem1_net)), mean(betweenness(chem2_net)), mean(betweenness(stats_net))),
                       marked = c(count(chem1_vertex_df, gender)$n[1], count(chem2_vertex_df, gender)$n[1], 
                                  count(stats_vertex_df, gender)$n[1]),
                       unmarked = c(count(chem1_vertex_df, gender)$n[2], count(chem2_vertex_df, gender)$n[2],
                                    count(stats_vertex_df, gender)$n[2])
)

kable(sum_stat, format = "simple", col.names = c("Network", "Number of Nodes", "Number of Edges", "Average Degree",
                                                 "Average Closeness", "Average Betweenness", "Number of Marked Gender Students",
                                                 "Number of Unmarked Gender Students"),
      caption = "Table 1: Network Summary Statistics")
Table 1: Network Summary Statistics
Network Number of Nodes Number of Edges Average Degree Average Closeness Average Betweenness Number of Marked Gender Students Number of Unmarked Gender Students
Chemistry 9/15 54 87 3.222222 0.0003716 0 40 14
Chemistry 9/20 48 42 1.750000 0.0004601 0 39 9
Statistics 45 70 3.111111 0.0005426 0 24 21

Network Visualization

In this section, I show different visualizations of our three networks of interest, first without coloring for each nodes’ gender affiliation, then adding in color. Then, I show some interactive plots of our network. Here we can form some preliminary conclusions about homophily in these two intro STEM courses before the modeling section.

Without Gender

plot(chem1_net, edge.arrow.size = 3, vertex.label = NA, vertex.size = 7)

First up is a visualization of the chemistry class from September 15th. In line with Table 1, we observe that most groups have at least four members, if not more, explaining why most individuals have a degree score a little over three—that is, most nodes in the network have at least three connections. It is also interesting that no individuals have selected to ignore the professor’s request and work alone.

plot(chem2_net, edge.arrow.size = 3, vertex.label = NA, vertex.size = 7)

A few days later on September 20th, intro chemistry students worked in small groups again. However, with six less students much of the original network’s attributes has shifted. Many people work in groups of three, though some have formed pairs to complete their work, while a couple elected to work alone entirely. While not colored by gender yet, it is easy to see how the above network could lead to more homophily, with the smaller groups and those working in pairs likely those of the same gender, not branching out to work with others.

plot(stats_net, edge.arrow.size = 3, vertex.label = NA, vertex.size = 7)

And finally, we look at the semester-long statistics group projects. Due to the longer lasting and higher formality level of these groups (not formed during one class to tackle a short-term assignment, but rather groups formed to work with one another throughout the course of the entire semester), their size are much more uniform. Due to this, the statistics network has the highest closeness score out of the three.

Visualization with Gender

Now, we move on to igraph visualizations that color each node by their observed gender presentation. While the author acknowledges the arbitrary designation and history of the colors pink and blue with the binary system, I argue that the two colors help to universally convey gender differences faster than any other two colors. Pink nodes are not all women; they include women and non-binary individuals as well.

Chemistry 9/15

# Create a dataframe with the genders
chem1_gender_vec <- data.frame(label = c("marked",   "marked",   "marked",   "unmarked", "marked",   "marked",   "marked",   "marked" ,  "unmarked", "unmarked", "unmarked", "marked",   "marked",   "marked" ,  "marked" ,  "marked"  , "marked" ,  "marked"  ,
 "marked" ,  "marked" ,  "marked" ,  "marked"  , "marked" ,  "marked"  , "marked"  , "unmarked", "marked" , 
 "marked"  , "unmarked" ,"unmarked", "marked",   "marked" ,  "marked",  "marked",   "unmarked", "unmarked",
"unmarked", "unmarked", "unmarked", "unmarked", "marked"  , "marked" ,  "marked" ,  "marked",   "marked"  ,
"marked" ,  "marked",   "marked" , "marked" ,  "marked"  , "marked",   "marked" ,  "marked" ,  "unmarked"))

# Transform the gender dataframe into a dataframe with colors instead
chem1_gender_vec <- chem1_gender_vec %>%
  mutate(label = case_when(label == "marked" ~ "pink3",
                           label == "unmarked" ~ "aquamarine2"))

colrs <- chem1_gender_vec$label
V(chem1_net)$color <- colrs
V(chem1_net)$label <- NA
E(chem1_net)$arrow.size <- .2
E(chem1_net)$edge.color <- "gray80"
plot(chem1_net, vertex.size = 7)
legend(x=-1.5, y=-1.1, c("Marked","Unmarked"), pch=21,
col="#777777", pt.bg=c("pink3", "aquamarine2"), pt.cex=2, cex=.8, bty="n", ncol=1)

Now with the attribute of gender colored, we observe the presence of homophily in the September 15th network. Not only are six groups (out of 13) homogeneous in terms of gender, but of the groups that do have mixed gender participants, they tend to have one just one outlier. Only two of the 13 groups have an even number of each gender present. This suggests the presence of homophily.

Chemistry 9/20

# Generate colors for gender:
chem2_gender_vec <- data.frame(label = c("marked" ,  "marked",   "unmarked", "marked",   "marked" ,  "marked" ,  "marked",   "marked",   "marked" , 
"marked" ,  "marked" ,  "marked",   "marked" ,  "marked",   "marked" ,  "marked"   ,"marked",   "marked" , 
"marked" ,  "marked" ,  "unmarked", "marked" ,  "marked",  "marked" ,  "marked" ,  "marked",   "marked" , 
"unmarked", "marked",   "unmarked", "marked",   "marked",   "unmarked", "unmarked", "unmarked", "unmarked",
"unmarked", "marked" ,  "marked"  , "marked",   "marked"  , "marked" ,  "marked" ,  "marked" ,  "marked",  
"marked" ,  "marked"  , "marked" ))

chem2_gender_vec <- chem2_gender_vec %>%
  mutate(label = case_when(label == "marked" ~ "pink3",
                           label == "unmarked" ~ "aquamarine2"))

colrs <- chem2_gender_vec$label
V(chem2_net)$color <- colrs
V(chem2_net)$label <- NA
E(chem2_net)$arrow.size <- .2
E(chem2_net)$edge.color <- "gray80"
plot(chem2_net, vertex.size = 7)
legend(x=-1.5, y=-1.1, c("Marked","Unmarked"), pch=21,
col="#777777", pt.bg=c("pink3", "aquamarine2"), pt.cex=2, cex=.8, bty="n", ncol=1)

As we observed in the uncolored network section, the increased fragmentation of groups in this network created an opportunity for increased homophily. This is observed as the few unmarked gender participants present during the lecture tended to work with themselves only (with an exception of three individuals), as did the majority of marked gender individuals. Compared to the previous network, there is even less integration in this network.

Statistics

# Generate colors for gender:
stats_gender_vec <- data.frame(label = c("unmarked", "unmarked", "unmarked", "marked",  "marked"  , "unmarked", "unmarked", "unmarked", "unmarked",
"marked" ,  "unmarked" ,"unmarked", "unmarked", "marked" ,  "marked" ,  "unmarked", "marked" ,  "unmarked",
"marked" ,  "unmarked", "marked" ,  "marked" ,  "marked",   "marked"  , "marked" ,  "marked"  , "unmarked",
"marked",   "marked"   ,"marked",   "marked" ,  "unmarked" ,"marked" ,  "unmarked", "unmarked", "marked",  
"marked" ,  "unmarked", "marked" ,  "marked" ,  "marked"  , "marked" ,  "unmarked", "unmarked", "unmarked"))

stats_gender_vec <- stats_gender_vec %>%
  mutate(label = case_when(label == "marked" ~ "pink3",
                           label == "unmarked" ~ "aquamarine2"))

colrs <- stats_gender_vec$label
V(stats_net)$color <- colrs
V(stats_net)$label <- NA
E(stats_net)$arrow.size <- .2
E(stats_net)$edge.color <- "gray80"
plot(stats_net, vertex.size = 7)
legend(x=-1.5, y=-1.1, c("Marked","Unmarked"), pch=21,
col="#777777", pt.bg=c("pink3", "aquamarine2"), pt.cex=2, cex=.8, bty="n", ncol=1)

The statistics network is much less conclusive in terms of homophily than the previous two. There are only two out of the eleven groups that are completely homogeneous by gender. The other groups have a relatively equal balance of unmarked and marked gender participants, though six have just one outlier member. There is certainly less homophily in this network than the previous two, though the question is if this homophily is significant compared to homophily generated in random networks.

Interactive Networks with Gender

Finally, I graph the same networks as shown above but in an interactive form! While they show the same findings, I find interactive networks much more engaging and fun to look at. All participants’ names have been excluded, and are instead numbered with an ID from 0 upward. For each network, feel free to drag around or zoom within the window to change the view, hover over nodes to see their ID, and drag them around to recalibrate the network.

Chemistry 9/15

library(htmltools)
library(networkD3)

chem1_edges <- as.data.frame(get.edgelist(chem1_net))

chem1_nodes_d3 <- mutate(chem1_vertex_df, student_id = (student_id) - 1)
chem1_edges_d3 <- mutate(chem1_edges, from = parse_number(V1) - 1, to = parse_number(V2) - 1)

library(networkD3)

forceNetwork(Links = chem1_edges_d3, Nodes = chem1_nodes_d3, Source = "from", Target = "to", 
             NodeID = "student_id", Group = "gender", 
             colourScale = JS('d3.scaleOrdinal().domain(["marked", "unmarked"]).range(["#e75480", "#289D8C"])'),
             opacity = 1, fontSize = 16, zoom = TRUE)

Chemistry 9/20

chem2_edges <- as.data.frame(get.edgelist(chem2_net))

chem2_nodes_d3 <- mutate(chem2_vertex_df, student_id = (student_id) - 1)
chem2_edges_d3 <- mutate(chem2_edges, from = parse_number(V1) - 1, to = parse_number(V2) - 1)

forceNetwork(Links = chem2_edges_d3, Nodes = chem2_nodes_d3, Source = "from", Target = "to", 
             NodeID = "student_id", Group = "gender", 
             colourScale = JS('d3.scaleOrdinal().domain(["marked", "unmarked"]).range(["#e75480", "#289D8C"])'),
             opacity = 1, fontSize = 16, zoom = TRUE)

Statistics

stats_edges <- as.data.frame(get.edgelist(stats_net))

stats_nodes_d3 <- mutate(stats_vertex_df, student_id = (student_id) - 1)
stats_edges_d3 <- mutate(stats_edges, from = parse_number(V1) - 1, to = parse_number(V2) - 1)

forceNetwork(Links = stats_edges_d3, Nodes = stats_nodes_d3, Source = "from", Target = "to", 
             NodeID = "student_id", Group = "gender", 
             colourScale = JS('d3.scaleOrdinal().domain(["marked", "unmarked"]).range(["#e75480", "#289D8C"])'),
             opacity = 1, fontSize = 16, zoom = TRUE)

Assortativity Coefficients

Before moving onto the modeling section, I use the assortativity coefficient from the igraph package to measure homophily, used to “quantify the extent to which connected nodes share similar properties”. Consider this measure as similar to a Pearson correlation, from 0 to 1, with a higher value indicating a more significant relationship.

assortativity_calc <- data.frame(chem1 = assortativity_nominal(chem1_net, 
                                                               as.factor(V(chem1_net)$gender), 
                                                               directed = FALSE),
                                 chem2 = assortativity_nominal(chem2_net, 
                                                               as.factor(V(chem2_net)$gender), 
                                                               directed = FALSE),
                                 stats = assortativity_nominal(stats_net, 
                                                               as.factor(V(stats_net)$gender), 
                                                               directed = FALSE))
kable(assortativity_calc, format = "simple", col.names = c("Chemistry 9/15", "Chemistry 9/20",
                                                         "Statistics"),
      caption = "Table 2: Assortativity Coefficients for the Three Classroom Networks")
Table 2: Assortativity Coefficients for the Three Classroom Networks
Chemistry 9/15 Chemistry 9/20 Statistics
0.2467532 0.4318841 0.0827191

In line with the results shown in our above networks, both chemistry networks have observable amounts of homophily, with more in the September 20th network than the September 15th network. However, the statistics network has much less, and arguably insignificant amounts of homophily.

ERGM Modeling

To prove the significance of the results shown in the above network visualizations and assortativity coefficient calculation for gender, I use the ergm (Exponential Random Graph Model) package. ERGMs are of use for us here over other regression package, as they have the capacity to simulate similar networks that are used to perform “regression-like” analyses. They also allow us to more intuitively write about interactions on the dyadic level, and make predictions about what ties a node with specific attributes is more likely to form. This is of extreme relevance to this exploration of homophily on the dyadic and global network perspective. Additionally, these randomly generated networks help us control for the dependence that is innate to human social networks (which violates a basic assumption of regression), as well as consider individuals within their social context. To read more about ERGMs and the theory behind their generation, this webpage and this blog are excellent resources, and guided me through my analysis.

Each of the modeling sections to follow uses nodematch and nodefactor. The two work in tandem with one another, with nodefactor controlling for overrepresentation of possible ties between nodes that share an attribute by considering the distribution of attributes overall in a network. Considering the unequal balance of gender in the chemistry network, this is crucial for our modeling procedure. Its value or significance level is not relevant to the measure of homophily; nodefactor’s role is solely to control for overrepresentation. The important measure to us, however, is nodematch, which helps us quantify the tendency for students to form groups (AKA ties) with other students who are of their same gender presentation. In this way the ergm model is like a logistic regression, as it predicts the probability that a pair of nodes in a network will form a tie between them. I form three separate models here, one for each network. Because the networks are different sizes, as well as do not interact with one another, the ergm model cannot compile them all into the same model to look at effects cross-class. Thus, we analyze each network and its homophily separately with its randomly generated networks, then compare their significance and coefficient values with one another.

library(statnet)
library(ergm)
chem1_net <- as.network(chem1_edge_df,
  directed = FALSE, vertices = chem1_vertex_df, multiple = TRUE
)

chem2_net <- as.network(chem2_edge_df,
  directed = FALSE, vertices = chem2_vertex_df, multiple = TRUE
)

stats_net <- as.network(stats_edge_df,
  directed = FALSE, vertices = stats_vertex_df, multiple = TRUE
)

Results

chem1_homophily <- ergm(chem1_net~edges+nodematch('gender') + nodefactor('gender'))
chem2_homophily <- ergm(chem2_net~edges+nodematch('gender') + nodefactor('gender'))
stats_homophily <- ergm(stats_net~edges+nodematch('gender') + nodefactor('gender'))
export_summs(chem1_homophily, chem2_homophily, stats_homophily, scale = FALSE, 
             model.names = c("Model 1: Chemistry 9/15 Network",
                             "Model 2: Chemistry 9/20 Network",
                             "Model 3: Statistics Group Project Network"),
             coefs = c("Edges" = "edges",
                       "Nodematch: Gender (Homophily Measure)" = "nodematch.gender",
                       "Nodefactor: Control for Gender Distribution" = "nodefactor.gender.unmarked"))

Model 1: Chemistry 9/15 NetworkModel 2: Chemistry 9/20 NetworkModel 3: Statistics Group Project Network
Edges-3.30 ***-4.42 ***-2.72 ***
(0.28)   (0.47)   (0.25)   
Nodematch: Gender (Homophily Measure)0.70 *  1.29 ** 0.23    
(0.28)   (0.47)   (0.25)   
Nodefactor: Control for Gender Distribution0.19    0.53    0.03    
(0.19)   (0.28)   (0.17)   
nobs1431.00    1128.00    990.00    
independence1.00    1.00    1.00    
iterations6.00    6.00    6.00    
logLik-324.72    -175.63    -252.48    
AIC655.44    357.26    510.97    
BIC671.24    372.34    525.66    
*** p < 0.001; ** p < 0.01; * p < 0.05.
Table 3: ERGM Regression on the Three Classroom Group Work Networks

As the nodematch for gender shows in Model 1, homophily by gender is significant in the Chemistry 9/15 in-class groups. Essentially, this means that when a node is of the same gender as the node the model is evaluating a tie between, their log-odds likelihood of forming a tie increases by 0.70, with a standard deviation of 0.28 (significant at the p < 0.01 level). Therefore, we see individuals of the same gender preferring to work with one another over people of the opposing gender presentation, a finding that is significant when comparing across the network and other random networks.

As nodematch for gender in Model 2 shows, homophily by gender is significant in the Chemistry 9/20 in-class groups. Here, the log-odds of a node forming a tie with someone of the same gender over someone of the opposing gender is 1.29, with a standard deviation of 0.47 (significant at the p < 0.01 level). This value is larger than in the previous chemistry network, supporting our observations in the past couple of sections that the second round of in-person class collaborations had a larger presence homophily than before.

In contrast to the above two chemistry networks, the homophily measure is not significant for the statistics groups, as shown in Model 3. While greater than zero, this difference is not larger than its standard deviation, and is not proven through a t test to be significantly non-zero. Therefore, the slight homophily we observed in the statistics group project network in the above visualizations is not great enough to rule out the possibility of being caused by random chance. While it is unclear if this is because of the groups formed by the professor for students who did not form their own groups, it is important to note that the Math 141 class also had less unproportional participation. One class had slightly more marked gender participation (though barely significant), while the other had more unmarked gender participation (significant). This suggests that something about the Intro Statistics course may make it more accessible to people with marked gender identities and presentations, over the Intro Chemistry course. More discussion on this to follow.

Conclusion

In short, this report has proven the presence of homophily for impromptu group work in an Intro Chemistry course. The results for long-term, semester long project groups is less conclusive—results are not significant, though it is unclear whether this is due to increased accessibility in statistics courses or if it was due to the professor selecting group assignments for some people. Without access to information on who the professor personally chose to put together and who selected to work with one another, I am unable to make conclusive remarks about the lack of significant homophily. It is important to point out, however, that the participation results for the introductory statistics course were also not necessarily favoring unmarked individuals. This suggests that it is possible that the Intro Statistics classes at Reed may be more accessible to those of marked gender, in terms of class participation and collaborating with those of differing gender marked status. However, with the present data we are not able to further investigate these ponderings. The results across both classes are reaffirmed by network visualizations, which shows how and if homophily takes form in the group work networks. Additionally, later in the semester by a couple days, less individuals coded as unmarked for their gender came to chemistry lecture for the second designated work day. This, combined with the smaller groups formed, led to higher levels of homophily on September 20th than the 15th.

Future research should further delve into what led to the significance of the difference between September 15th vs. 20th and its causes: do people of unmarked genders reject “work days” more, believing themselves successful enough without them? Does homophily systematically increase or decrease over time, as people are more comfortable with one another? Furthermore, other future research should investigate the impacts of homophily, testing to see whether its presence leads to lower scores overall in groups with more marked gender individuals, or other resource deficits. Additionally, seeking to answer what others factors cause homophily (socioeconomic status, race, ethnicity, etc.) would also be a fruitful line of research.