Simpson’s_Paradox_UC

Introduction

In 1973, UC Berkeley faced a potential admissions scandal in regards to seemingly disparate admission rates between male and female applicants. However, after keenly analyzing the UC Berkeley admissions data set from that year, one can see that the disparity is caused by the fact that data can be spliced in different ways, telling a different story based on the perspective it is seen from. In this report, I will delineate what the source of the disparity was and how it evidences the popular phenomenon known as The Simpson’s Paradox.

Data Analysis

I’ll begin by looking at the overall admission rates of male and female applicants. As illustrated in the graph below, the overall admission rate of female applicants was significantly lower than that of male applicants.

To be exact, 44.5% of all male applicants were admitted as opposed to 30.4% of all female applicants.

By looking at these figures, one might be inclined to believe that there was a bias against female applicants. However, after splitting these admission rates by department, there seems to be a reversal in the rates for male and female applicants. The graph below evidences this.

UC Berkeley had six departments at the time. In departments A, B, D, and F, the percentage of admitted female applicants was higher than the percentage of admitted male applicants.

So, how is this possible? It seems logical that if the overall admission rate of male applicants is higher than that of female applicants, male applicants would also have a higher admission rate if the data is broken down by department. However, this is not the case. The reversal that comes up after analyzing these UC Berkeley admissions data is a classic example of Simpson’s Paradox.

Simpson’s Paradox

Simpson’s Paradox is “a trend or result that is present when data is put into groups that reverses or disappears when the data is combined” (Grigg). In the UC Berkeley admissions data analyzed in this report, the trend that was apparent is the seemingly higher admission rate for male applicants. However, upon grouping the data by department, in 4 out of 6 departments, there is a higher admission rate for female applicants. This reversal is caused by what is known as a lurking variable, a hidden variable that neither the explanatory nor the response variable but could influence the trends of a particular set of data.

UC Berkeley’s Lurking Variable

As mentioned earlier, female applicants had a higher rate of admission in 4 out of 6 departments, showing that the bias in admission was actually in favor of female applicants.

However, their overall admission percentage was low because of the departments that most female applicants applied to. In the UC Berkeley Admissions data, the lurking variable was the fact that female applicants tended to apply to departments that had lower rates of admission. I’ll illustrate this in the graphs below. Let’s start by looking at the admission rates for the different departments.

According to the graph above, departments C, D, E, and F had the lowest admission rates. We’ll now look at the gender proportion of applicants in every department.

As seen above, departments C, D, E, and F have the highest proportion of female applicants. Therefore, the overall number of female applicants who were rejected was not because of a bias against them, but because most of the female applicants applied to more selective departments.

Works Cited

Bailiss, C. (n.d). 07. styling. • pivottabler. (n.d.). http://www.pivottabler.org.uk/articles/v07-styling.html Accessed September 14, 2023.

Bailiss, C. (n.d.). Create pivot tables in R. Create Pivot Tables in R •. http://www.pivottabler.org.uk/. Accessed September 14, 2023.

Grigg, T. (2019, January 8). “Simpson’s paradox and Interpreting Data.” Medium. https://towardsdatascience.com/simpsons-paradox-and-interpreting-data-6a0443516765. Accessed September 14, 2023.

Simpson’s_Paradox_UC_Berkeley

Koech

2023-09-14

Introduction

Data Analysis

Simpson’s Paradox

UC Berkeley’s Lurking Variable

Works Cited