Gender and Attractiveness Bias in Student Evaluations of Professors at University of Texas

library(ggplot2)
library(dplyr)
library(mosaic)
evals <- read.csv("evals.csv", header=TRUE, stringsAsFactors = TRUE)

Introduction:

Every semester, students at most major universities are required to fill out professor and class evaluations. These student evaluations often have major implications for a professor’s career path and credibility as an academic professional, so it is very important that they are fair and even across the ranges of professors. Unfortunately, people (students included) are often plagued by bias, both conscious and subconscious. These include assumptions about ability based on gender, race, age, and attractiveness, just to name a few. Even without expressly thinking it, we make evaluations based upon these biases every day, and these biases make a difference when they show up in formal evaluations. This can result in two professors, who have the same level of education and knowledge about their subject and who teach with the same level of competence, getting different evaluation scores because of their students’ subconscious biases based on their gender or attractiveness. We are analyzing this data to see how much of a difference these factor make in students’ evaluations, based on data from the University of Texas.

Our formal research questions are threefold:

First, do female professors have a different evaluation score than male professors, on average, at the University of Texas?
Next, do attractive professors have a different evaluation score than non-attractive people, on average, at the University of Texas?
Finally, is a professor’s perceived attractiveness or gender a better indicator of higher evaluation scores by students at the University of Texas?

Before our evaluation of the data, we hypothesize that:

Female professors will have statistically significant lower evaluation scores, on average, than male professors.
Professors with above-average attractiveness will have statistically significant higher evaluation scores, on average, than professors who are deemed below-average.
That the gender of a professor will have a statistically significant larger effect on evaluation score than level of attractiveness.

Descriptive Statistics:

evals <- evals %>%
  mutate(bty_avg2 = ifelse(bty_avg >5,  yes = "above", no = "not above"))

ggplot(data = evals, mapping = aes(x = gender, fill = bty_avg2)) +
  geom_bar(position = "dodge")

ggplot(data = evals, mapping = aes(x = gender, y = score)) +
  geom_boxplot(aes (fill = bty_avg2))

ggplot(data = evals, mapping = aes(x = bty_avg2, y = age)) +
  geom_boxplot(aes(fill = bty_avg2))

Inference: Graph #1: Shows amount of professors who were evaluated broken down by gender. We can see that there are more male professors than female and that based on their beauty-score there are significantly more professors who are below average on their beauty ranking. There are about 270 males and only 70 of them scored above average. Whereas there are about 200 female professors and around 60 of them scored above average.

Graph #2: Shows the score of professors based on gender and beauty score average. We can see a significant difference in scores with the males having a higher score overall. We can also see that both genders who had an above average beauty score did score significantly higher than professors who had a below average beauty score.

Graph #3: Shows the beauty average score of professors based on age. Professors between the age of 35 and 50 tend to score above average in their beauty ranking. Whereas, professors between the age of 45 to 60 scored below average. This tells us that the age of the professor most likely does have something to do with their beauty ranking.

When testing our hypothesis we dove into the question, is the confidence interval looking at bty_avg2 as the predictor of score “further from 0” compared to that of gender as predicted? When looking at our two independent means sample, beauty average and score, results rejected our null hypothesis in saying that bty-avg2 is less of a predictor of score than gender.

shuffled_bty_avg2 <- evals %>%
  mutate(bty_avg2 = shuffle(bty_avg2)) %>%
  group_by(bty_avg2) %>%
  summarize(mean_shuf_bty_avg2 = mean(score))

Conclusion:

Gender and Attractiveness Bias in Student Evaluations of Professors at University of Texas

Hunter Peterson, Cassie Dandy, Emily Trumpus, Elizabeth Oshiro