Impacts of Gender on Participation in Intro STEM Courses

# Libraries
library(readr)
library(tidyverse)
library(lubridate)
library(lemon)
library(knitr)
library(cowplot)
library(broom)
library(rsample)
library(yardstick)
library(jtools)
library(officer)
library(flextable)

# ggplot theme function
robins_ggplot_theme <- function(font = "Times") {
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5, family = font, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, face = "italic", family = font),
        axis.title = element_text(family = font),
        axis.text = element_text(family = font),
        legend.title = element_text(family = font),
        legend.text = element_text(family = font),
        plot.caption = element_text(family = font, face = "italic", hjust = 0))
}

robins_facet_theme <- function(font = "Times") {
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5, family = font, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, face = "italic", family = font),
        axis.title = element_text(family = font),
        axis.text = element_text(family = font),
        legend.title = element_text(family = font),
        legend.text = element_text(family = font),
        plot.caption = element_text(family = font, face = "italic", hjust = 0))
}

Abstract

This post reports preliminary results from data collection and analysis regarding the accessibility of introductory STEM courses at Reed College. Utilizing data measuring a participating student’s perceived marked or unmarked gender, the author is able to mobilize previous sociological research about gender minority’s participation in the STEM field (considering major selection, self evaluation, and feelings of belonging) to begin understanding the role of gender on participation for intro chemistry and statistics students. Chi square tests for all three course sections but one prove that those with marked gender identities (non-men; gender is seen as “abnormal” or “irregular”) participate less than expected considering their proportion of the class; gender participation differs by class as well, proven through a significant ANOVA (Analysis of Variance) test. Lastly, I run both a logistic and OLS multiple linear regression model to determine the effects of time in the semester, course section, and gender in the context of classroom participations. However, due to increased variance in the full data set, high error in the models, and violated model assumptions, the results are inconclusive. When filtered to only include the chemistry observations (\(n = 30\) observations instead of \(n = 16\), and a standard deviation of 0.20 compared to 0.55 and 0.56), model assumptions are satisfied and we find that students of unmarked gender presentations participate more per capita.

Research Questions

Informed by previous sociological literature about gender differences in the STEM field, I hope to interrogate the saliency of gender in introductory STEM courses, understanding what driving students to be more or less likely to participate in class by either answering or asking question. Due to the relevancy of gender at the individual, interactional, and cultural level, the classroom—especially classrooms in male-dominated fields—causes students with marked gender identities to feel discomfort and alienated from group activities and knowledge (Legewie & DiPrete 2014b; Riegle-Crumb et al. 2006; Carrell et. al 2010). This leads me to believe that students of marked gender identities will be less likely to participate across all three classroom environments.

Specifically, I ask:

In the introductory STEM courses I observed at Reed, were individuals with a marked gender presentation significantly less likely to participate then their unmarked counterparts? Put differently, who feels comfortable taking up space?
Does the participation in STEM classes differ at the college between courses? If so, how much, and in which ones? What does this answer reveal about the potential mechanisms behind why gender minorities feel more comfortable participating in one classroom than another?
Did these relationships change over time, suggesting that gender minorities grow more comfortable with participating as they are more familiar with a space, a comfortablity that those with unmarked genders just start out with? If this is the case, what implications does this have on the impact of gender in the classroom?

Data and Methods

Data for this project comes from participant observations collected by the author in three introductory STEM course sections: Intro to Chemistry (one section, a 50 minute lecture component meeting three days a week), and Intro to Probability and Statistics (two sections, an 80 minute lab component meeting one day a week). The author gained access to these three classes through their role as a course assistant in all of them, tallying data for this project while also responding to questions, helping facilitate small group discussions, and occasionally assisting in lecturing of material. Due to the juggling of tasks each class, the author acknowledges that there are likely some participations in class or questions asked that are missing from the data. However, this number is low and random, meaning it is not probable to impact results. Future studies should involve measures of inter-coder reliability, with more people collecting data to minimize bias.

Furthermore, there are some important upsides to this method of data collection and the relevance of a course assistant/TA collecting data on students’ participation and perceived gender. First of all, as the course assistant, I hold office hours and have tutored a number of individuals in all of the courses, meeting nearly all students at least once. This means I have had the opportunity to learn people’s names and pronouns, increasing the accuracy of the data, especially the recording of someone’s perceived gender as marked or unmarked. Additionally, as a queer and trans non-binary individual myself, this study on classroom participation by gender is more inclusive of non-normative identities. While certainly not perfect at identifying an individual and their gender from sight alone, I know what specific markers and cues non-binary individuals use to convey their marked gender identity (pride pins, hair, style of dress, etc.). Consider further I have been a marked gender participant in multiple STEM courses at Reed as well, and is thus knowledgeable firsthand about the discomforts and difficulty to participate. With this extra layer of knowledge and indignity to the case, I argue that this case is unique and has the potential to be mobilized through a frame of understanding by an individual directly impacted by this issue—a gender minority currently fighting for a place or recognition in STEM.

I now turn to the specific variables included in this dataset. A participation was recorded when an individual either asked or answered a question in class, of which there are a total of 557 (split by gender/class elaborated on further in the Exploratory Data Analysis portion). These observations were tallied for every class meeting/session, designated into one of two columns—unmarked or marked gender. The .csv file is organized on this measurement of data; each row represents the total number of participations for either the Marked or Unmarked gender during one individual day of class. Other variables recorded is the Date the course took place, the Class observed (chemistry, section one of statistics, or section two of statistics), and lastly the Num_Participation_Weighed, or participations per capita by marked/unmarked Gender. See below for a breakdown of the demographics per class. This value was calculated by dividing the raw counts of participations for each gender category by the total number of their gender category in the course. These numbers are subject to variation, however, considering how attendance and thus size of the gender category may fluctuate class-to-class. However, the ratio of marked to unmarked gender has appeared to stay relatively consistent in each course over time, so this should not be a confounding variable or result.

demographics <- data.frame(Class = c("Chemistry 101 ", "Math 141 Section I", "Math 141 Section II"),
                           Class_Size = c(54, 11, 15),
                           Num_Marked = c(40, 6, 6),
                           Num_Unmarked = c(14, 5, 9))
kable(demographics, format = "simple", col.names = c("Course", "Class Size", "Students with a Marked Gender", "Students with an Unmarked Gender"), caption = "Table 1: Demographics and Size of Each Course")

Table 1: Demographics and Size of Each Course
Course	Class Size	Students with a Marked Gender	Students with an Unmarked Gender
Chemistry 101	54	40	14
Math 141 Section I	11	6	5
Math 141 Section II	15	6	9

Finally, the methodology I will be using to investigate the relationships and patterns in the data is (1) exploratory data analysis and summaries of the data, then transitioning to (2) univariate and bivariate statistical tests, specifically a \(\chi^2\) test of significance (comparing the expected number of students with marked gender identities participating to the actual observed values) and an ANOVA (analysis of variance) test of marked gender participation across the three classes. Finally, I turn to (3) a multivariate methodology, utilizing both logistic and OLS multiple linear regression to understand relationships between class, gender, date in the semester, and participation (per capita).

In a “Data and Methods” section, include details about the dataset and where it can be found, the number of observations in the sample, the question wording and coding (can be included as an Appendix), and the methodology proposed (univariate, bivariate, or multivariate, etc.).

obs <- read_csv("Downloads/Gender & Work/Classroom Observation Data.csv")

obs <- obs %>%
  mutate(Date = mdy(Date)) %>%
  pivot_longer(cols = c(Num_Marked_Participations, Num_Unmarked_Participation),
               names_to = "Gender",
               values_to = "Num_Participation") %>%
  mutate(Gender = case_when(Gender == "Num_Marked_Participations" ~ "Marked",
                            Gender == "Num_Unmarked_Participation" ~ "Unmarked"),
         Num_Participation_Weighed = NA) %>%
  arrange(Class)

gender_weight <- c(rep(c(40, 14), length.out = 30), rep(c(6, 5), length.out = 16), 
                       rep(c(6, 9), length.out = 16))

for(i in 1:length(obs$Num_Participation)) {
  obs$Num_Participation_Weighed[i] <- obs$Num_Participation[i] / gender_weight[i]
}

Exploratory Data Analysis

This first section explores the relationship between the four main variables, Class, Date, Gender, and Num_of_Participations (both raw counts and per capita measurements).

Participation by Class and Date

obs %>%
  group_by(Class, Date) %>%
  summarize(Num_Participation = sum(Num_Participation)) %>%
  ggplot(mapping = aes(x = Num_Participation)) +
  geom_histogram(bins = 17, color = "white", fill = "aquamarine3") +
  robins_ggplot_theme() +
  labs(x = "Number of Participations",
       y = "Frequency (in raw counts)",
       title = "Number of Participations per Individual Class Session",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

Overall, the data of the number of participations per class has a slight right skew, though it could be argued to be roughly bell shaped and symmetric (though it is clear that more observations from classes collected at the end of the term will be crucial to make this relationship even more clear). As being approximately normal is a general requirement for many modeling and statistical procedures, it is something to pay close attention to throughout the report. It seems that across all of the intro courses, the number of participations usually fall between 10 to 25 a class period, though there is some data recorded with less than 10 or more than 30 observations.

part_sum_stat <- obs %>%
  group_by(Class, Date) %>%
  mutate(Class = case_when(Class == "chem" ~ "Chem 101",
                           Class == "stats1" ~ "Math 141, Section 1",
                           Class == "stats2" ~ "Math 141, Section 2")) %>%
  summarize(Num_Participation = sum(Num_Participation)) %>%
  summarize(Total_Num = sum(Num_Participation),
            Participation_Mean = mean(Num_Participation),
            Participation_Median = median(Num_Participation),
            Participation_Min = min(Num_Participation),
            Participation_Max = max(Num_Participation))

knitr::kable(part_sum_stat, format = "simple", col.names = 
        c("Class", "Total Number of Participations", "Mean", "Median", "Min", "Max"),
        caption = "Table 2: Summary Statistics For Number of Participations")

Table 2: Summary Statistics For Number of Participations
Class	Total Number of Participations	Mean	Median	Min	Max
Chem 101	353	23.53333	23	12	37
Math 141, Section 1	88	11.00000	10	4	19
Math 141, Section 2	116	14.50000	13	7	31

Table 2 reports the summary statistics for our main dependent variable, number of participants per class (and per gender designation in each class session). The total number of participations in chemistry was 353, with an average of 23.53 per class session. Participations never went below 12, and reached a maximum of 37 in one class session. In the first section of Math 141 there were 88 total participations, with an average of 11 a class period. This class reached a minimum of four participations and a maximum of 19. Conversely, the second section of Math 141 there was 116 observations, with an average of 14.5 per class and a low of 13, and a high of 31. Why do the Math 141 courses have differing levels of participation? It is worth further investigation, but I believe the lower participation in the first section stems from it starting very early (8:50 AM) and having more underclassmen, leading overall to less participation.

obs %>%
  group_by(Class, Date) %>%
  summarize(Num_Participation = sum(Num_Participation)) %>%
  mutate(Class = case_when(Class == "chem" ~ "Chem 101",
                           Class == "stats1" ~ "Math 141, Section 1",
                           Class == "stats2" ~ "Math 141, Section 2")) %>%
  ggplot(mapping = aes(x = Num_Participation, fill = Class)) +
  geom_histogram(bins = 25, color = "white") +
  robins_ggplot_theme() +
  scale_fill_brewer(type = "qual", palette = 4, direction = -1) +
  labs(x = "Number of Participations",
       y = "Frequency (in raw counts)",
       title = "Number of Participations per Individual Class Session",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

The above histogram reinforces the results of Table 2 through a visual medium. The Chem 101 course has more participation on average, with the second section of Math 141 slightly beating the participation of the first section, but not even rivaling the participation shown in Chem 101.

obs %>%
  group_by(Class, Date) %>%
  summarize(Num_Participation = sum(Num_Participation)) %>%
  mutate(Class = case_when(Class == "chem" ~ "Chem 101",
                           Class == "stats1" ~ "Math 141, Section 1",
                           Class == "stats2" ~ "Math 141, Section 2")) %>%
  ggplot(mapping = aes(x = Date, y = Num_Participation, color = Class)) +
  geom_point() +
  robins_ggplot_theme() +
  geom_smooth(method = "lm", se = F) +
  scale_color_brewer(type = "qual", palette = 4, direction = -1) +
  labs(x = "Class",
       y = "Total Number of Participations",
       title = "Total Number of Participations Over Time, by Class",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

Lastly, we turn to considering the role that time in the semester plays on participation in class. For the intro to probability and statistics course, the amount of participation has decreased over time, while it has stayed more or less the same for Chem 101. As mentioned in the scholarly review, the chemistry professor is committed to diversity and educational inclusion, implementing the predict-observe-explain teaching style to facilitate work in small groups and reinforce knowledge and participation. This model is likely to be more engaging and interesting to the students, causing participation rates to stay approximately the same all semester, while participation in the more traditional lab/lecture style statisitics course has dropped off.

obs_date_part <- obs %>%
  group_by(Class, Date) %>%
  summarize(Num_Participation = sum(Num_Participation))

class_time <- lm(Num_Participation ~ Class + Date, data = obs_date_part)

class_time_int <- lm(Num_Participation ~ Class * Date, data = obs_date_part)

export_summs(class_time, class_time_int, scale = FALSE, 
             model.names = c("Model 1: Parallel Slopes",
                             "Model 2: Different Slopes (Interaction)"),
             coefs = c("(Intercept)" = "(Intercept)",
                       "Math 141: Section 1" = "Classstats1",
                       "Math 141: Section 2" = "Classstats2",
                       "Date" = "Date",
                       "Math 141: Section 1 * Date" = "Classstats1:Date",
                       "Math 141: Section 2 * Date" = "Classstats2:Date"))

	Model 1: Parallel Slopes	Model 2: Different Slopes (Interaction)
(Intercept)	1897.14	-378.46
	(1143.72)	(1570.98)
Math 141: Section 1	-12.36 ***	4038.73
	(2.74)	(2686.39)
Math 141: Section 2	-8.86 **	4869.40
	(2.74)	(2686.39)
Date	-0.10	0.02
	(0.06)	(0.08)
Math 141: Section 1 * Date		-0.21
		(0.14)
Math 141: Section 2 * Date		-0.26
		(0.14)
N	31	31
R2	0.50	0.57
* p < 0.001; p < 0.01; * p < 0.05.

Table 3: Number of Participations in Intro STEM Courses by Class Department and Date

To test the significance of the above graph, I run two models of an OLS multiple linear regression. Here the dependent variable is the number of participations in the class. For Model 1, which utilizes a parallel slopes method, indicates that both the statistics sessions are less likely to increase the number of participations, though date is not a significant predictor here. Due to the differing slopes of the lines by class in the above graph, I assumed that introducing an interaction term to compare the difference in the time by course would have a significant relationship. However, this was not the case, and warrants further study and exploration in other research. Trends certainly exist, as shown in the graph, but need more data for the trends to become more apparent and salient.

Participation Considering Gender

Here, we begin to explore how participation changes when considering the gender (marked vs. unmarked) of students in the class. To control for the different number of each gender per class, I introduce a new variable, which measures the per capita of participation by the entire population of that gender in the class. It does this by dividing the number of participations for either the marked or unmarked gender students by the true number of marked or unmarked gender students.

obs %>%
  ggplot(mapping = aes(x = Num_Participation, fill = Gender)) +
  geom_histogram(bins = 25, color = "white") +
  robins_ggplot_theme() +
  scale_fill_brewer(type = "qual", palette = 1, direction = -1) +
  labs(x = "Number of Participations",
       y = "Frequency (in raw counts)",
       title = "Number of Participations per Individual Class Session,
       Considering Marked or Unmarked Gender",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

Here we consider the number of participations by gender. The spread seems fairly even across both categories, though it looks like individuals of marked genders participate slightly more on average, making up more of the outliers greater than 15 participations.

obs %>%
  ggplot(mapping = aes(x = Num_Participation_Weighed, fill = Gender)) +
  geom_histogram(bins = 25, color = "white") +
  robins_ggplot_theme() +
  scale_fill_brewer(type = "qual", palette = 1, direction = -1) +
  labs(x = "Number of Participations",
       y = "Frequency (in raw counts)",
       title = "Number of Participations per Session,
       Considering Marked or Unmarked Gender per Capita",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

However, to test how the above relationship holds up when considering the proportion of each gender in the classroom space, I plot the histogram by the per capita participations for each gender category. Here, the relationship we saw earlier flips, where those with unmarked genders tend to participate more per capita than those of a marked gender.

obs %>%
  ggplot(mapping = aes(x = Gender, y = Num_Participation, fill = Gender)) +
  geom_col() +
  coord_flip() +
  robins_ggplot_theme() +
  scale_fill_brewer(type = "qual", palette = 4, direction = -1) +
  labs(x = "Gender",
       y = "Total Number of Participations",
       title = "Total Participations for the Semester, by Gender",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

Consider this graph an alternative representation to the one a couple of graphs ago; the total highest number of participations in classes goes to those with marked genders, by a sizeable margin.

obs %>%
  ggplot(mapping = aes(x = Gender, y = Num_Participation_Weighed, fill = Gender)) +
  geom_col() +
  coord_flip() +
  robins_ggplot_theme() +
  scale_fill_brewer(type = "qual", palette = 4, direction = -1) +
  labs(x = "Gender",
       y = "Total Number of Participations per Capita of their Gender Category",
       title = "Total Participations for the Semester, by Gender per Capita",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

This plot also shows a flip in the relationship, with slightly more participations for members of the unmarked gender category. However, is this a large enough difference to be statistically significant? We’ll answer this question in the later statistical sections.

obs %>%
  mutate(Class = case_when(Class == "chem" ~ "Chem 101",
                           Class == "stats1" ~ "Math 141, Section 1",
                           Class == "stats2" ~ "Math 141, Section 2")) %>%
  ggplot(mapping = aes(x = Num_Participation, fill = Gender)) +
  geom_histogram(bins = 20, color = "white") +
  robins_facet_theme() +
  facet_wrap(~ Class, ncol = 2) +
  scale_fill_brewer(type = "qual", palette = 1, direction = -1) +
  labs(x = "Number of Participations",
       y = "Frequency (in raw counts)",
       title = "Number of Participations per Individual Class Session,
       Considering Gender and Class",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

When considering the number of participations by gender and the specific STEM class they correlate to, it is easy to identify chemistry as having more marked gender participants, with the statistics courses lagging behind. But is this significant when considering the 40 individuals with a marked gender in the class to the 14 unmarked individuals? We turn to this in the next plot.

obs %>%
  mutate(Class = case_when(Class == "chem" ~ "Chem 101",
                           Class == "stats1" ~ "Math 141, Section 1",
                           Class == "stats2" ~ "Math 141, Section 2")) %>%
  ggplot(mapping = aes(x = Num_Participation_Weighed, fill = Gender)) +
  geom_histogram(bins = 20, color = "white") +
  robins_facet_theme() +
  facet_wrap(~ Class, ncol = 2) +
  scale_fill_brewer(type = "qual", palette = 1, direction = -1) +
  labs(x = "Number of Participations (Weighed)",
       y = "Frequency (in raw counts)",
       title = "Number of Participations per Individual Class Session,
       Considering Gender and Class per Capita",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

Here we see, though it is difficult with more limited and smaller data after the reshaping of the x-axis to being a per capita measure, that the per capita participation by gender category tends to be larger for those with an unmarked gender. This is an especially visible trend for Chem 101 (as it has more data, twice the observations) and generally higher participation rates overall. In each of the Math 141 courses the relationship is much less stark and should be investigated further in the Chi square and regression sections.

obs %>%
  mutate(Class = case_when(Class == "chem" ~ "Chem 101",
                           Class == "stats1" ~ "Math 141, Section 1",
                           Class == "stats2" ~ "Math 141, Section 2")) %>%
  ggplot(mapping = aes(x = Date, y = Num_Participation_Weighed, color = Class)) +
  geom_point() +
  facet_wrap(~Gender) +
  robins_facet_theme() +
  geom_smooth(method = "lm", se = F) +
  scale_color_brewer(type = "qual", palette = 4, direction = -1) +
  labs(x = "Class",
       y = "Total Number of Participations per Capita of the Gender Category",
       title = "Total Participations per Capita Over Time, by Class and Gender",
       subtitle = "Observational Data Collected by Robin Hardwick",
       caption = "
       © Robin Hardwick | Reed College")

Lastly, we show a plot of the difference between each of the three courses and their participation by gender category over time. This relationship will be more intentionally explored in the later modeling section through regressions, though it is worth noting the standout trends on the above plot as well. Participation per capita in chemistry stays constant for both gender categories over time, though it is slightly less overall for those of a marked gender. Furthermore, the participation per capita has decreased over time for both gender categories, though it has decreased more extremely (more negative slopes) for individuals with unmarked genders. This has important implications and impacts on our results in the modeling section.

Statistical Tests

In this section I will implement common tests of statistical significance to isolate the impacts of gender in the classroom setting on participation, as well as comparing the significance of participation amount across course sections.

Chi Square Test

First, we run a chi-square test of independence, which compares the rate of difference between the number of participations we would expect to see if both gender categories participated equally based on population proportions and the actual, observed participation numbers. It is important to note here that our data meets the six assumptions of a chi-square test:

The data is counts of participation, not transformed into percentages or another form.
A participation only is assigned to one gender category.
- And thus, each participation is counted in just one cell in the data itself.
Data are not paired, and thus is independent.
The variable being measured is categorical.
As the below tables show, the data in each of my expected/observed cells is greater than five, and certainly greater than one.

Chemistry

# Observed data (num participation for each gender)
count_chemg <- obs %>%
  filter(Class == "chem") %>%
  group_by(Gender) %>%
  summarize(num_part = sum(Num_Participation))
gender_chem <- count_chemg$num_part

p_chem <- c(40/54, 14/54)

# Table of observed vs. expected
tab <- count_chemg %>%
  mutate(expected = sum(num_part) * p_chem)

kable(tab, format = "simple", col.names = c("Gender", "Observed", "Expected"),
      caption = "Table 4. Observed vs. Expected Values for Participation in Chem 101")

Table 4. Observed vs. Expected Values for Participation in Chem 101
Gender	Observed	Expected
Marked	228	261.48148
Unmarked	125	91.51852

# Expected 
res <- chisq.test(gender_chem, p = p_chem)

res <- tidy(res) %>%
  select(-method)

kable(res, format = "simple", col.names = c("Chi Square Statistic",
                                            "P Value",
                                            "Degrees of Freedom"),
      caption = c("Table 5. Chi Square Results for Chem 101"))

Table 5. Chi Square Results for Chem 101
Chi Square Statistic	P Value	Degrees of Freedom
16.53614	4.77e-05	1

As Tables 4 and 5 indicate above, the difference between the observed and expected values of participation by each gender category are indeed significant, with a p-value of approximately zero. Because this is below the threshold of \(\alpha = 0.05\) (chosen as such due to it being industry standard and hopefully reduce errors in our conclusions), we can reject the null hypothesis of no relationship between gender and participation and accept the alternative, that gender has an impact on a student’s participation in this (and potentially other Chem 101 courses at colleges) intro chemistry classroom.

Statistics, Section 1

# Observed data (num participation for each gender)
count_stat1g <- obs %>%
  filter(Class == "stats1") %>%
  group_by(Gender) %>%
  summarize(num_part = sum(Num_Participation))
gender_stats1 <- count_stat1g$num_part

p_stats1 <- c(6/11, 5/11)

# Table of observed vs. expected
tab <- count_stat1g %>%
  mutate(expected = sum(num_part) * p_stats1)

kable(tab, format = "simple", col.names = c("Gender", "Observed", "Expected"),
      caption = "Table 6. Observed vs. Expected Values for Participation in Math 141, Section 1")

Table 6. Observed vs. Expected Values for Participation in Math 141, Section 1
Gender	Observed	Expected
Marked	56	48
Unmarked	32	40

# Expected 
res <- chisq.test(gender_stats1, p = p_stats1)

res <- tidy(res) %>%
  select(-method)

kable(res, format = "simple", col.names = c("Chi Square Statistic",
                                            "P Value",
                                            "Degrees of Freedom"),
      caption = c("Table 7. Chi Square Results for Math 141, Section 1"))

Table 7. Chi Square Results for Math 141, Section 1
Chi Square Statistic	P Value	Degrees of Freedom
2.933333	0.0867682	1

Table 6 shows a different relationship than the chemistry course, where the number of students with a marked gender participate slightly more than those of an unmarked gender identity. However, as Table 7 shows in the chi square test results, the difference between the observed and expected values of participation by each gender category are not significant, with a p-value of 0.09. Because 0.09 > \(\alpha = 0.05\), we fail to reject the null hypothesis of no relationship between gender and participation, concluding that gender is not significant in determining the number of participations for the 8:50 AM intro statistics course.

Statistics, Section 2

# Observed data (num participation for each gender)
count_stat2g <- obs %>%
  filter(Class == "stats2") %>%
  group_by(Gender) %>%
  summarize(num_part = sum(Num_Participation))
gender_stats2 <- count_stat2g$num_part

p_stats2 <- c(6/11, 5/11)

# Table of observed vs. expected
tab <- count_stat2g %>%
  mutate(expected = sum(num_part) * p_stats2)

kable(tab, format = "simple", col.names = c("Gender", "Observed", "Expected"),
      caption = "Table 8. Observed vs. Expected Values for Participation in Math 141, Section 2")

Table 8. Observed vs. Expected Values for Participation in Math 141, Section 2
Gender	Observed	Expected
Marked	38	63.27273
Unmarked	78	52.72727

# Expected 
res <- chisq.test(gender_stats2, p = p_stats2)

res <- tidy(res) %>%
  dplyr::select(-method)

kable(res, format = "simple", col.names = c("Chi Square Statistic",
                                            "P Value",
                                            "Degrees of Freedom"),
      caption = "Table 9. Chi Square Results for Math 141, Section 1")

Table 9. Chi Square Results for Math 141, Section 1
Chi Square Statistic	P Value	Degrees of Freedom
22.20805	2.4e-06	1

As Tables 8 and 9 indicate above, the difference between the observed and expected values of participation by each gender category are indeed significant, with a p-value of approximately zero. Because \(2.4 \times 10^{6} < \alpha = 0.05\), we have evidence to reject the null hypothesis of no relationship between gender and participation and accept the alternative, that gender has an impact on a student’s participation on the second section of intro statistics (less participation overall for morning courses? Or do morning classes target people with a marked identity more than an unmarked identity?)

ANOVA

# Classes and their mean observation values for weighed/unweighed participation for marked gender individuals??
# Unweighed

obs_anova <- obs %>%
  filter(Gender == "Marked")

anova_by_classu <- aov(Num_Participation ~ Class, data = obs_anova)
summary(anova_by_classu)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Class        2  693.5   346.7   22.69 1.39e-06 ***
## Residuals   28  427.9    15.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Weighed
anova_by_classw <- aov(Num_Participation_Weighed ~ Class, data = obs_anova)
summary(anova_by_classw)

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## Class        2  3.342  1.6709   13.97 6.2e-05 ***
## Residuals   28  3.349  0.1196                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally, we run an ANOVA test to determine if the variation in participation for those of marked gender by course is significant, when considering both the range of variation (standard deviation, for instance) within groups and the variation between groups. It is important to note that if even just one of the courses varies significantly from the others, the ANOVA test will be different. This is proven for both the raw participant counts and the participant counts per capita, with each getting a p-value of approximately zero. Thus, we can reject the null hypothesis that the three classes have the same amount of participation, and accept that there is variation between these groups.

Modeling

The last section of this report investigates multivariate relations, utilizing two distinct tools: a logistic regression model to predict the likelihood of a participation in the classroom being from a member of a marked gender presentation (marked gender = 1, unmarked = 0), and a multiple linear regression to better understand the effect of the variables (Gender, Date, and Class) on class participation per capita.

set.seed(598)
obs_dummy <- obs %>%
  mutate(Gender = case_when(Gender == "Marked" ~ 1,
                            Gender == "Unmarked" ~ 0))
part_initial <- initial_split(obs_dummy, prop = 0.77, strata = Gender)
part_train <- training(part_initial)
part_val <- testing(part_initial)

Diagnostics

Before diving into the results of the models, it is crucial to walk through diagnostics of these models and how well they fit the data at hand, especially given the smaller sample size.

Logistic Model (all classes)

This model is one of the logistic options (measuring the log-odds likelihood of a marked gender individual being the to participate), considering all three classes in the data: chemistry, statistics section one, and statistics section two.

library(MASS)
part_log_mod_train <- glm(Gender ~ Class + Date + Num_Participation_Weighed, 
                          data = part_train, family = "binomial")

probs <- predict(part_log_mod_train, part_val, type = "response")
preds <- as.factor(ifelse(probs >=.5, 1, 0))
obse <- as.factor(part_val$Gender)
results <- data.frame(obse, preds, probs)

conf_mat(results, truth = obse, estimate = preds)

##           Truth
## Prediction 0 1
##          0 2 3
##          1 6 5

a <- accuracy(results, truth = factor(obse), estimate = preds)

test_error <- 1 - a$.estimate
paste("The test error of this model is", test_error)

## [1] "The test error of this model is 0.5625"

As shown in the above confusion matrix and test error, this logistic regression does not do a good job predicting the gender of the individual, getting it wrong over 56.25% of the time.

r <- roc_curve(data = results, truth = obse, probs, event_level = "second")
autoplot(r) +
        robins_ggplot_theme() +
        labs(x = "1 - Specificity",
             y = "1 - Sensitivity",
             title = "ROC Curve for the Full Logistic Model")

Furthermore, the above ROC curve that we obtain from the model (a curve which should ideally be a right angle in the top left corner) is also highly concerning. Because the sensitivity is so high, we must proceed with caution with our coefficient values and making conclusions from this model.

Consider here the last graph in the exploratory section, and the large amounts of standard deviation for the statistics courses and their number of participations by gender category. This, alongside a small \(n\) (more observations forthcoming), clearly have detrimental impacts on the full logistic model.

MLR Model (all classes)

Here, we take a closer look at participation per capita and the impact of the suite of variables available: Gender, Date, and Class. The OLS multiple linear regression model fits the model with the smallest amount of squared residuals to find the ideal hyperplane that can control for all of the predictors and make predictions on our response variable, Num_Participation_Weighed as well.

reg_all_classes_mod <- lm(Num_Participation_Weighed ~ Gender + Date + Class,
                          data = part_train)

calculate_mse <- function(mod, val) {
  preds <- predict(mod, val)
  mse <- mean((val$Num_Participation_Weighed - preds)^2)
  mse
}

mse_full_mod <- calculate_mse(reg_all_classes_mod, part_val)
paste("The MSE for this model is:", mse_full_mod)

## [1] "The MSE for this model is: 0.0752004704315927"

layout(matrix(c(1,2,3,4),2,2))
plot(reg_all_classes_mod)

As the diagnostics plot for the full MLR model show above, there are a few things to be concerned about, though generally the model meets its assumptions. The Scale-Location graph has no patterns to the data, the data is approximately normally distributed as shown in the Normal Q-Q plot, and there are not residuals or leverage points in Cook’s distance, or a place of concern on strongly impacting our coefficients. Additionally, our mean square error is 7.52%, which is quite low. While it is slightly worrying that the Residuals vs. Fitted plot follows a bit of a nonlinear, convex trend, which a small note of worry here. Therefore the full MLR model should help us make certain conclusions about the data, though we have reason to be skeptical of the results.

Logistic Regression (only chemistry)

Next, we look at logistic regression for only the chemistry participant observations. This model is one of the logistic options (measuring the log-odds likelihood of a marked gender individual being the to participate), considering all just one class in the data: chemistry.

obs_chem <- obs %>%
  filter(Class == "chem")

obs_chem_dum <- obs_chem %>%
  mutate(Gender = case_when(Gender == "Marked" ~ 1,
                            Gender == "Unmarked" ~ 0))

part_initial1 <- initial_split(obs_chem_dum, prop = 0.77, strata = Gender)
part_train1 <- training(part_initial1)
part_val1 <- testing(part_initial1)

chem_log_mod <- glm(Gender ~ Date + Num_Participation_Weighed, 
                    data = part_train1, family = "binomial")

probs <- predict(chem_log_mod, part_val1, type = "response")
preds <- as.factor(ifelse(probs >=.5, 1, 0))
obse <- as.factor(part_val1$Gender)
results <- data.frame(obse, preds, probs)

conf_mat(results, truth = obse, estimate = preds)

##           Truth
## Prediction 0 1
##          0 4 2
##          1 0 2

test_error <- 1 - a$.estimate
paste("The test error of this model is", test_error)

## [1] "The test error of this model is 0.5625"

The test error for this model and the confusion matrix is much better than the previous logistic regression model, nearly 20 percentage points lower than the other. I chose to limit the model down to just chemistry participations in order to look at the case that had the most data and lowest standard deviation to isolate and see the impacts of the variables in this specific instance.

r <- roc_curve(data = results, truth = obse, probs, event_level = "second")
autoplot(r) +
        robins_ggplot_theme() +
        labs(x = "1 - Specificity",
             y = "1 - Sensitivity",
             title = "ROC Curve for the Logistic Model for Participant's Gender")

Our ROC curve is much better this time around too, with lower sensitivity overall. Though not the perfect model, no such thing exists, and this model does a good job with approximations and is generally accurate (at least, accurate 62.5% of the time).

MLR Model (only chemistry)

Lastly, I will run diagnostics on the MLR regression models for only chemistry students.

chem_reg <- lm(Num_Participation_Weighed ~ Gender + Date, data = part_train)

mse_chem_mod <- calculate_mse(chem_reg, part_val)
paste("The MSE for this model is:", mse_chem_mod)

## [1] "The MSE for this model is: 0.109101633059686"

layout(matrix(c(1,2,3,4),2,2))
plot(chem_reg)

As the diagnostics plot for the partial MLR model show above, there are a few issues with our model, though generally the model meets its assumptions. The Scale-Location graph has no patterns to the line fit to the data (though it is weirdly located on opposite ends, not in the middle. Same goes for Residuals vs. Fitted), the data is approximately normally distributed as shown in the Normal Q-Q plot, and there are not residuals or leverage points in Cook’s distance, or a place of concern on strongly impacting our coefficients. Additionally, our mean square error is 1.38%, almost six percentage points more accurate than the (already quite small) test error. It is slightly worrying that the Residuals vs. Fitted plot follows a bit of a nonlinear, convex trend, which a small note of worry here. Therefore the full MLR model should help us make certain conclusions about the data, though we have reason to be skeptical of the results.

Model Results and Discussion

In this last section, I run all of the models again with the full dataset in order to have the most accurate coefficient values as possible (in the previous section we fit the model on just 77% of the data, and used the last 23% as our test set to determine the accuracy of the model), then compile them all into the same table, shown below in Table 10.

all_log <- glm(Gender ~ Class + Date + Num_Participation_Weighed, 
               data = obs_dummy, family = "binomial")

chem_log <- glm(Gender ~ Num_Participation_Weighed + Date, 
                    data = obs_chem_dum, family = "binomial")

reg_all_classes_mod <- lm(Num_Participation_Weighed ~ Gender + Date + Class,
                          data = obs)
reg_all_classes_mod_gender <- lm(Num_Participation_Weighed ~ Gender,
                          data = obs)

chem_reg <- lm(Num_Participation_Weighed ~ Gender + Date, data = obs_chem)

export_summs(all_log, reg_all_classes_mod_gender, reg_all_classes_mod, 
             chem_log, chem_reg, scale = FALSE, 
             model.names = c("Model 1: Logistic Regression on all Classes",
                             "Model 2: MLR Regression on all Classes (Gender Only)",
                             "Model 3: MLR Regression on all Classes",
                             "Model 4: Logistic Regression (Chemistry Only)",
                             "Model 5: MLR Regression on (Chemistry Only)"),
             coefs = c("(Intercept)" = "(Intercept)",
                       "Math 141: Section 1" = "Classstats1",
                       "Math 141: Section 2" = "Classstats2",
                       "Date" = "Date",
                       "Number of Participations per Capita" = "Num_Participation_Weighed",
                       "Unmarked Gender, baseline = Marked Gender" = "GenderUnmarked"))

	Model 1: Logistic Regression on all Classes	Model 2: MLR Regression on all Classes (Gender Only)	Model 3: MLR Regression on all Classes	Model 4: Logistic Regression (Chemistry Only)	Model 5: MLR Regression on (Chemistry Only)
(Intercept)	85.95	0.69 ***	148.04 **	-157.32	-3.11
	(278.50)	(0.09)	(51.48)	(440.47)	(31.29)
Math 141: Section 1	0.30		0.51 ***
	(0.71)		(0.12)
Math 141: Section 2	0.27		0.46 ***
	(0.70)		(0.12)
Date	-0.00		-0.01 **	0.01	0.00
	(0.01)		(0.00)	(0.02)	(0.00)
Number of Participations per Capita	-0.59			-8.92 *
	(0.68)			(3.54)
Unmarked Gender, baseline = Marked Gender		0.08	0.08		0.22 **
		(0.12)	(0.10)		(0.06)
N	62	62	62	30	30
AIC	95.19	88.81	68.46	36.22	-16.57
BIC	105.82	95.19	81.22	40.42	-10.97
Pseudo R2	0.02			0.42
R2		0.01	0.35		0.31
* p < 0.001; p < 0.01; * p < 0.05.

Table 10: The above table is a compilation of all the regressions from this section. Logistic regressions mobilize gender as their response variable, the OLS MLR models utilize classroom participation per capita.

I will now talk through the results of the models, from Model 1 to 4 (in the same order as the diagnostics above).

Model 1: for the logistic regression model, we’re predicting the probability that the individual at hand is a marked gender, considering all three courses. For instance, the model indicates that for someone of a marked gender presentation, they are less likely to have a larger number of participations per capita (though this decreased likelihood is not statistically significant). This is likely a result of the issues with the model identified in the diagnostics session, as well as lower sample size and increased standard deviation for the statistics courses. However, the results in the next model are more compelling due to its more reliable diagnostics result.
Model 2: Looking at just gender and participation per capita, the relationship is not significant. Though two of our chi square tests were significant for gender, I believe that the large amount of standard deviation and small sample size (and higher per capita contribution in the statistics courses and thus more impactful variables) in the classes and gender may have contributed to this conclusion, confounding its accuracy.
Model 3: The MLR model with all classes included in the data. While both of the statistics courses have a positive, significant impact on per capita participation (being in Math 141 Section 1 gives a 0.51 boost in per capita participation over Chem 101, while Math 141 Section 2 has a 0.46 boost over Chem 101), what is surprising is the continued lack of significance for gender. Date is also significant, indicating a -0.01 decrease in per capita participation as the semester goes on. Returning to gender, with the other controls introduced, the effect of gender remains turned off.
Model 4: To try and and isolate the impact that the smaller sample size and increased standard deviation of the two statistics courses, I run another logistic regression model, with only the 30 cases from the Chem 101 lecture. Here we see again that the number of participants per capita has a negative impact on the response, meaning that those of marked genders are less likely to participate. However, this time, the negative impact is statistically significant, giving some credence to the theory that the data from the statistics classes may be confounding results. It is also possible that the results do not hold up when controlling for the classes—perhaps other factors such as the class size, class format, and teaching style are more salient and confound the impact of gender. Further research and data is needed to prove this point, however.
Model 5: Lastly, we look at a MLR model to investigate the relationship between our variables and participation per capita. Here unmarked gender has a positive and significant impact on participation, providing support for our hypotheses that more normative, “unmarked” categories of an identity feel more comfortable to share and ask questions in a classroom environment. People of an unmarked gender had an increase of 0.22 on the participation per capita scale over those of marked genders, with a p < 0.01.

In short, while the effects of gender are significant in two out of three classes and (seen through the chi square). However, the impact of gender is less salient in this data set (intro courses at Reed College) than class (with important confounding factors such as size and instructor). When looking at Table 10 the standard error values for unmarked gender is about the same size as the standard deviation of other courses. Thus, due to the smaller effect/coefficient of gender than the courses, despite their same standard errors, it does not appear as significant. It is difficult here to make conclusions across all of the data, and we need more evidence to further interogate the research questions and claims.

Conclusions

In short, this study showed that within classroom dynamics, the participation “per capita” of each broad gender category (marked or unmarked gender presentations) was greater for the unmarked individuals in two out of three classrooms (excluding one of the sections of Math 141). These relationships were proven through a chi square test and the construction of multiple graphs. I also discovered that participation differs significantly by subject, with chemistry having much more frequent engagement than a statistics course. Finally, I discussed how the regressions used in the last section are somewhat limited or difficult to make conclusions from due to violated assumptions (especially for the first logistic regression model), the small sample size, and large standard error. They still were able to helpfully inform us that the presence of the statistics courses really changes the data, indicating a potential path of future exploration to discover why this was the case. When only looking at the chemistry courses and all other variables are controlled, the gender impact was positive and significant on participation for those in an unmarked gender identity.

Some extensions of this study that could begin to answer some of the critical questions this methodology raised would be to (1) get a focus group of STEM students in intro classes and ask them about their experiences, using these as tools for future data collection (a very Gerber & van Landingham (2021) perspective), (2) taking into account factors that explain the huge impact of class, such as the intro statistics courses at Reed having smaller lab class sizes, hugely varied teaching styles, etc. that impact the per capita gender participation and should be investigated thoroughly as well. Consider here that the chemistry professor chooses herself who to call upon after people raise their hand in her lecture, while i the statistics course everyone’s questions are answered by myself or the professor.

Other studies that could extend this framework would be about how men are more likely to answer direct, challenging questions about math in class (performing “masculinity”), or a study that records everyone in a class who raises their hand at any point—to answer or ask a question. This would reduce the instructor’s bias in the selection of different gendered students and would instead record who feels comfortable enough to volunteer for participation in an intro STEM class, and would be very telling for the sociology of belonging.