Setup

Packages

library(pacman)
p_load(DT, psych, ggplot2, dplyr)

Data

D <- read.csv("NFLWonderlic.csv")

datatable(D, extensions = c("Buttons"), options = list(dom = 'Bfrtip', buttons = c('copy', 'csv', 'print'), scrollX = T))

describeBy(D, group = "Race")

## 
##  Descriptive statistics by group 
## group: 
##           vars  n  mean   sd median trimmed  mad min max range  skew kurtosis
## Player*      1 15  8.00 4.47      8    8.00 5.93   1  15    14  0.00    -1.44
## Score        2 15 23.27 6.85     22   23.54 2.97   8  35    27 -0.15    -0.20
## Position*    3 15  1.00 0.00      1    1.00 0.00   1   1     0   NaN      NaN
## Race*        4 15  1.00 0.00      1    1.00 0.00   1   1     0   NaN      NaN
## Note*        5 15  1.47 0.83      1    1.38 0.00   1   3     2  1.13    -0.63
##             se
## Player*   1.15
## Score     1.77
## Position* 0.00
## Race*     0.00
## Note*     0.22
## ------------------------------------------------------------ 
## group: A
##           vars n  mean   sd median trimmed  mad min max range  skew kurtosis
## Player*      1 3  2.00 1.00      2    2.00 1.48   1   3     2  0.00    -2.33
## Score        2 3 23.00 3.61     24   23.00 2.97  19  26     7 -0.26    -2.33
## Position*    3 3  1.00 0.00      1    1.00 0.00   1   1     0   NaN      NaN
## Race*        4 3  1.00 0.00      1    1.00 0.00   1   1     0   NaN      NaN
## Note*        5 3  1.67 0.58      2    1.67 0.00   1   2     1 -0.38    -2.33
##             se
## Player*   0.58
## Score     2.08
## Position* 0.00
## Race*     0.00
## Note*     0.33
## ------------------------------------------------------------ 
## group: AW
##           vars n mean sd median trimmed mad min max range skew kurtosis se
## Player*      1 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Score        2 1   33 NA     33      33   0  33  33     0   NA       NA NA
## Position*    3 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Race*        4 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Note*        5 1    1 NA      1       1   0   1   1     0   NA       NA NA
## ------------------------------------------------------------ 
## group: B
##           vars   n   mean    sd median trimmed    mad min max range  skew
## Player*      1 290 143.97 83.35  143.5  143.96 106.75   1 287   286  0.00
## Score        2 290  19.73  7.47   20.0   19.40   7.41   4  48    44  0.46
## Position*    3 290   8.59  3.23    9.0    8.89   1.48   1  13    12 -0.76
## Race*        4 290   1.00  0.00    1.0    1.00   0.00   1   1     0   NaN
## Note*        5 290   1.20  1.15    1.0    1.00   0.00   1  11    10  6.39
##           kurtosis   se
## Player*      -1.21 4.89
## Score         0.21 0.44
## Position*     0.05 0.19
## Race*          NaN 0.00
## Note*        42.61 0.07
## ------------------------------------------------------------ 
## group: B 
##           vars n mean sd median trimmed mad min max range skew kurtosis se
## Player*      1 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Score        2 1   16 NA     16      16   0  16  16     0   NA       NA NA
## Position*    3 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Race*        4 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Note*        5 1    1 NA      1       1   0   1   1     0   NA       NA NA
## ------------------------------------------------------------ 
## group: BW
##           vars  n mean   sd median trimmed  mad min max range  skew kurtosis
## Player*      1 10  5.5 3.03    5.5    5.50 3.71   1  10     9  0.00    -1.56
## Score        2 10 22.7 7.97   21.5   21.88 8.90  14  38    24  0.47    -1.22
## Position*    3 10  3.8 1.48    4.0    3.88 1.48   1   6     5 -0.44    -0.89
## Race*        4 10  1.0 0.00    1.0    1.00 0.00   1   1     0   NaN      NaN
## Note*        5 10  2.5 1.90    1.5    2.25 0.74   1   6     5  0.66    -1.32
##             se
## Player*   0.96
## Score     2.52
## Position* 0.47
## Race*     0.00
## Note*     0.60
## ------------------------------------------------------------ 
## group: H
##           vars n mean sd median trimmed mad min max range skew kurtosis se
## Player*      1 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Score        2 1   28 NA     28      28   0  28  28     0   NA       NA NA
## Position*    3 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Race*        4 1    1 NA      1       1   0   1   1     0   NA       NA NA
## Note*        5 1    1 NA      1       1   0   1   1     0   NA       NA NA
## ------------------------------------------------------------ 
## group: W
##           vars   n   mean    sd median trimmed    mad min max range  skew
## Player*      1 310 155.50 89.63  155.5  155.50 114.90   1 310   309  0.00
## Score        2 310  28.41  6.97   28.0   28.27   5.93   9  50    41  0.21
## Position*    3 310  10.54  1.95   11.0   10.90   0.00   1  15    14 -2.39
## Race*        4 310   1.00  0.00    1.0    1.00   0.00   1   1     0   NaN
## Note*        5 310   1.46  2.15    1.0    1.00   0.00   1  14    13  4.88
##           kurtosis   se
## Player*      -1.21 5.09
## Score         0.39 0.40
## Position*     8.85 0.11
## Race*          NaN 0.00
## Note*        23.23 0.12

D <- subset(D, Race == "B" | Race == "W")

Data retrieved from https://iqtestprep.com/nfl-wonderlic-scores/, race obtained by training for facial color and variable reviewed prior to use. There were some misclassifications (e.g., Colin Kaepernick, Jordan Love, etc.) so I removed ones which I could not confirm via stuff like looking up their parents or finding information about their parents' backgrounds in their Wikipedia article.

Analysis

Wonderlic scores need to be converted from US norms (\(\mu\) = 21.75, \(\sigma\) = 7.6) to the typical IQ metric. Some of these scores are from different eras of players. They may not be totally comparable, but the scores have retained the same scale over time, so it's doubtful this does much to the relative means unless one or another era is overrepresented by group.

D$IQ <- ((D$Score - 21.75)/7.6)*15 + 100

st.err <- function(x){
  sd(x)/sqrt(length(x))}
TD <- aggregate(IQ ~ Race, D, mean); ND <- aggregate(IQ ~ Race, D, length); SD <- aggregate(IQ ~ Race, D, sd); XD <- aggregate(IQ ~ Race, D, st.err)
TD$N <- ND$IQ; TD$SD <- SD$IQ; TD$SE <- XD$IQ; TD$CI <- TD$SE * 1.96
TD

Cohen's d = 1.20037, Hedge's g = 1.201752 (pro forma significant digits, there's not really this much precision).

ggplot(D, aes(x = IQ, fill = Race)) + 
  geom_density(alpha = 0.5) + theme_bw() + scale_fill_manual(values = c("#CCCCCC","#FFFFFF"), labels = c("Black", "White")) + theme(legend.position = c(0.95, 0.9), legend.background = element_blank(), axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())

ggplot(D, aes(x = IQ, y = Race, fill = Race)) +
  geom_violin(alpha = 0.5) + theme_bw() + scale_fill_manual(values = c("#CCCCCC","#FFFFFF"), labels = c("Black", "White")) + theme(legend.position = "none", axis.title.y = element_blank()) + scale_y_discrete(labels = c("Black", "White"))

ggplot(TD, aes(x = Race, y = IQ, fill = Race)) + 
  geom_bar(position = position_dodge(0.9), colour = "black", stat = "identity") +
  geom_errorbar(position = position_dodge(0.9), width = 0.25, aes(ymin = IQ - CI, ymax = IQ + CI)) + 
  coord_cartesian(ylim = c(90, 115)) +
  scale_fill_manual(values = c("#CCCCCC","#FFFFFF")) +
  theme_bw() + theme(legend.position = "none", axis.title.x = element_blank()) + scale_x_discrete(labels = c("Black", "White"))

TD <- aggregate(IQ ~ Position, D, mean); ND <- aggregate(IQ ~ Position, D, length); SD <- aggregate(IQ ~ Position, D, sd); XD <- aggregate(IQ ~ Position, D, st.err)
TD$N <- ND$IQ; TD$SD <- SD$IQ; TD$SE <- XD$IQ; TD$CI <- TD$SE * 1.96
TD <- subset(TD, N >= 15) #excludes kickers and punters, other less common positions
TD

ggplot(TD, aes(x = Position, y = IQ, fill = Position)) + 
  geom_bar(position = position_dodge(0.9), colour = "black", stat = "identity") +
  geom_errorbar(position = position_dodge(0.9), width = 0.25, aes(ymin = IQ - CI, ymax = IQ + CI)) + 
  coord_cartesian(ylim = c(85, 115)) +
  theme_bw() + theme(legend.position = "none")

What about just in our only category large enough for a singular test? This helps us to understand if selection by position is explanatory.

QD <- subset(D, Position == "QB")
TD <- aggregate(IQ ~ Race, QD, mean); ND <- aggregate(IQ ~ Race, QD, length); SD <- aggregate(IQ ~ Race, QD, sd); XD <- aggregate(IQ ~ Race, QD, st.err)
TD$N <- ND$IQ; TD$SD <- SD$IQ; TD$SE <- XD$IQ; TD$CI <- TD$SE * 1.96
TD

library(ggthemes)
ggplot(TD, aes(x = Race, y = IQ, fill = Race)) + 
  geom_bar(position = position_dodge(0.9), colour = "black", stat = "identity", alpha = 0.8) +
  geom_errorbar(position = position_dodge(0.9), width = 0.25, aes(ymin = IQ - CI, ymax = IQ + CI)) + 
  coord_cartesian(ylim = c(90, 115)) +
  scale_fill_manual(values = c("#01516c","#800020")) +
  theme_economist() + theme(legend.position = "none", axis.title.x = element_blank(), plot.title = element_text(hjust = 0.5)) + scale_x_discrete(labels = c("Black", "White")) + ggtitle("Quarterback-only Subset")

That's a Cohen's d of 1.035443 and a Hedge's g of 1.066732. So somewhat, but not very much, smaller when selection may have occurred. The means did not significantly differ between the full sample and this subset.

ggplot(QD, aes(x = IQ, fill = Race)) + 
  geom_density(alpha = 0.5) + theme_economist() + scale_fill_manual(values = c("#01516c","#800020"), labels = c("Black", "White")) + theme(legend.position = c(0.94, 0.84), legend.background = element_blank(), axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), plot.title = element_text(hjust = 0.5)) + ggtitle("Quarterback-only Subset")

ggplot(QD, aes(x = IQ, y = Race, fill = Race)) +
  geom_violin(alpha = 0.5) + theme_economist() + scale_fill_manual(values = c("#01516c","#800020"), labels = c("Black", "White")) + theme(legend.position = "none", axis.title.y = element_blank(), plot.title = element_text(hjust = 0.5)) + scale_y_discrete(labels = c("Black", "White")) + ggtitle("Quarterback-only Subset")

We could try to see if the racial difference is attributable to broader selection for position by just residualizing for it.

library(umx)

## Loading required package: OpenMx

## 
## Attaching package: 'OpenMx'

## The following object is masked from 'package:psych':
## 
##     tr

## For an overview type '?umx'

## 
## Attaching package: 'umx'

## The following object is masked from 'package:stats':
## 
##     loadings

DR <- umx_residualize(IQ ~ Position, data = D)

## [1] "(Intercept) B = 93.76 [86.424, 101.096], t = 25.102, p < 0.001"
## [1] "PositionDB B = 20.55 [-10.575, 51.674], t = 1.297, p = 0.195"
## [1] "PositionDE B = 6.624 [-3.606, 16.854], t = 1.272, p = 0.204"
## [1] "PositionDL B = 36.339 [5.215, 67.463], t = 2.293, p = 0.022"
## [1] "PositionDT B = 6.405 [-5, 17.809], t = 1.103, p = 0.270"
## [1] "PositionFB B = 12.655 [-18.469, 43.779], t = 0.799, p = 0.425"
## [1] "PositionFS B = -13.003 [-44.127, 18.121], t = -0.821, p = 0.412"
## [1] "PositionK B = -18.924 [-50.048, 12.2], t = -1.194, p = 0.233"
## [1] "PositionLB B = 10.276 [1.486, 19.067], t = 2.296, p = 0.022"
## [1] "PositionOG B = 30.418 [7.807, 53.029], t = 2.642, p = 0.008"
## [1] "PositionOL B = 14.977 [5.992, 23.962], t = 3.274, p = 0.001"
## [1] "PositionOT B = 24.497 [-6.627, 55.621], t = 1.546, p = 0.123"
## [1] "PositionP B = 45.221 [22.609, 67.832], t = 3.928, p < 0.001"
## [1] "PositionQB B = 14.788 [7.274, 22.302], t = 3.865, p < 0.001"
## [1] "PositionRB B = -2.214 [-10.825, 6.397], t = -0.505, p = 0.614"
## [1] "PositionS B = 1.045 [-9.33, 11.42], t = 0.198, p = 0.843"
## [1] "PositionT B = 22.523 [-8.601, 53.647], t = 1.421, p = 0.156"
## [1] "PositionTE B = 16.064 [4.36, 27.768], t = 2.696, p = 0.007"
## [1] "PositionWR B = 2.155 [-6.337, 10.647], t = 0.498, p = 0.618"

DR$IQ <- DR$IQ + 100 #have to add back the metric mean
RD <- aggregate(IQ ~ Race, DR, mean); ND <- aggregate(IQ ~ Race, DR, length); SD <- aggregate(IQ ~ Race, DR, sd); XD <- aggregate(IQ ~ Race, DR, st.err)
RD$N <- ND$IQ; RD$SD <- SD$IQ; RD$SE <- XD$IQ; RD$CI <- RD$SE * 1.96
RD

Cohen's d of 0.732248 and Hedge's g of 0.732822. It seems accounting for position reduces the gap by 39%. This is similar to the typically 30-40% reduction in the gap after accounting for socioeconomic status in population-representative samples. The same caveats should come with this. Perhaps - as circumstantially evidenced by the quarterback example - there will be a growing gap with higher means for different positions like there is for education.

ggplot(RD, aes(x = Race, y = IQ, fill = Race)) + 
  geom_bar(position = position_dodge(0.9), colour = "black", stat = "identity", alpha = 0.8) +
  geom_errorbar(position = position_dodge(0.9), width = 0.25, aes(ymin = IQ - CI, ymax = IQ + CI)) + 
  coord_cartesian(ylim = c(90, 115)) +
  scale_fill_wsj("colors6") +
  theme_wsj() + theme(legend.position = "none", axis.title.x = element_blank(), plot.title = element_text(hjust = 0.5)) + scale_x_discrete(labels = c("Black", "White")) + ggtitle("Position-residualized Result")

ggplot(DR, aes(x = IQ, fill = Race)) + 
  geom_density(alpha = 0.5) + theme_wsj() + scale_fill_wsj("colors6", labels = c("Black", "White")) + theme(legend.position = c(0.82, 0.93), legend.background = element_blank(), axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), plot.title = element_text(hjust = 0.5)) + ggtitle("Position-residualized Result")

ggplot(DR, aes(x = IQ, y = Race, fill = Race)) +
  geom_violin(alpha = 0.5) + theme_wsj() + scale_fill_wsj("colors6", labels = c("Black", "White")) + theme(legend.position = "none", axis.title.y = element_blank(), plot.title = element_text(hjust = 0.5)) + scale_y_discrete(labels = c("Black", "White")) + ggtitle("Position-residualized Result")

Discussion

The NFL's controversy surrounding discussions should probably be conducted in the light of the lack of selection for cognitive ability in the NFL. For some positions this may be less true. Cognitive impairments may impact play regardless of the existence of an IQ-performance correlation for obvious reasons. Notably, studies generally only find performance relationships in the NFL for QBs. There is evidence against equalization of scores within the one position for which there was a sufficient sample to check (QBs) but residualization for position reduced the mean difference by just over a third; the reliability of this procedure is dubious with the small samples some positions had, but the result could be interesting, especially if it holds up with all data.

NFL Wonderlic Scores

Jay Lasker

August 28, 2020

Setup

Packages

Data

Analysis

Discussion