Simpson’s Paradox

In a nutshell, Simpson’s Pardox (also widely referred to as Yule–Simpson effect, amalgamation paradox, or reversal paradox) is a statistical phenomenon that occurs when trends, that appear in an aggregated dataset, reverse when separated into groups.

These kinds of problems result from combining data from several groups.

Example: Graduate School Admission to University of California, Berkeley (fall of 1973)

One of the best-known examples of Simpson’s paradox is a study of gender bias among graduate school admissions in the University of California, Berkeley.

The admission figures showed that men (in comparison to woman) applying during that particular fall were more likely to be admitted.

library(ggplot2)
library(vcd)
library (gridExtra)
ucba <- as.data.frame(UCBAdmissions)

ggplot(ucba, aes(Admit,fill=Gender)) + 
  geom_bar(aes(weight=Freq)) +
  labs(title="Graduate School Admissions", y = "Frecuency") 

(ucb <- margin.table(UCBAdmissions, 1:2))
##           Gender
## Admit      Male Female
##   Admitted 1198    557
##   Rejected 1493   1278
(fill_colors <- matrix(c("dark cyan", 
                         "gray", "gray", "dark magenta"), ncol = 2))
##      [,1]        [,2]          
## [1,] "dark cyan" "gray"        
## [2,] "gray"      "dark magenta"
mosaic(ucb, gp = gpar(fill = fill_colors, col = 0))

doubledecker(Admit ~ Gender, data = UCBAdmissions, gp=gpar(fill=c("palegreen3", "gray")))

Nevertheless, when examining the individual departments, it turned out that the admissions weren’t in fact biased. Matter of fact, two out of 6 departments (A and B) were significantly biased against men, whereas the rest of the departments were relatively non biased.

In this regard, even though at first glance there was an apparent gender bias among graduate school admissions to the University of California, in reality, it turned out that such a bias was none existing. One possible explanation for the overall “gender bias” could be accounted to that fact that, in general, female population, during the fall of 1973, applied to more selective Departaments, than male population.

doubledecker(Admit ~ Dept + 
               Gender, data = UCBAdmissions, gp=gpar(fill=c("palegreen3", "gray")))

ucb <- cotab_coindep(UCBAdmissions, condvars = "Dept", type = "assoc",n = 5000, margins = c(3, 1, 1, 3)) 

cotabplot(~ Admit + Gender | Dept, data = UCBAdmissions, panel = ucb)