Cognitive data of immigrants are sparse, especially for subgroups. Some useful data comes from the PISA surveys, but these are scholatic tests, not standard IQ measures. Another source of information is the military draft test. Here we reanalyze some Danish data previously analyzed by myself earlier (Kirkegaard, 2013).
The data concern some 22k persons who took the military draft test in 2003-2004. The persons are grouped into 2 groups: Western and non-Western. The Westerners include Danes, Swiss, North Americans and EEA as it was in ~2005 (the date of the report). As such, apparently non-Westerners include Australians and New Zealanders, which is very odd. The report refers to DST’s definition, which does in fact include the two British offshoots in the Western group.
Load packages and data.
library(pacman)
p_load(kirkegaard, dplyr)
d_count = read.csv("data/BPP_data.csv")
The data are given in count form, but it’s often easier to use them in normal form. So we transform them back.
#stupid plyr-based version because dplyr sucks for this kind of task
d = d_count %>%
plyr::adply(.margins = 1, function(x) {
#to vecto
x = unlist(x)
#repeat data in two dfs and bind
rbind(
data_frame(BPP_score = rep(x[1], x[2]),
group = rep("Western", x[2])
),
data_frame(BPP_score = rep(x[1], x[3]),
group = rep("Non-Western", x[3])
)
)
}, .expand = F, .id = NULL)
#basic
psych::describeBy(d$BPP_score, group = d$group)
## $`Non-Western`
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1479 33.5 11.23 34 33.69 11.86 0 66 66 -0.13 -0.48
## se
## X1 0.29
##
## $Western
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 21159 42.8 9.83 44 43.24 8.9 2 74 72 -0.46 0.29
## se
## X1 0.07
##
## attr(,"call")
## by.default(data = x, INDICES = group, FUN = describe, type = type)
#standard units
d$z_score = (d$BPP_score - mean(d$BPP_score[d$group == "Western"])) / sd(d$BPP_score[d$group == "Western"])
psych::describeBy(d$z_score, group = d$group)
## $`Non-Western`
## vars n mean sd median trimmed mad min max range skew
## X1 1 1479 -0.95 1.14 -0.89 -0.93 1.21 -4.35 2.36 6.71 -0.13
## kurtosis se
## X1 -0.48 0.03
##
## $Western
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 21159 0 1 0.12 0.05 0.9 -4.15 3.17 7.32 -0.46 0.29
## se
## X1 0.01
##
## attr(,"call")
## by.default(data = x, INDICES = group, FUN = describe, type = type)
#IQ units
d$IQ = d$z_score*15 + 100
#plot
GG_denhist(d, "IQ", group = "group") +
scale_x_continuous(breaks = seq(0, 200, by = 5)) +
theme(axis.text.x = element_text(angle = -30))
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
silence(ggsave("figures/IQ_dists.png"))
GG_denhist(d, "BPP_score", group = "group") +
scale_x_continuous(breaks = seq(0, 200, by = 5))
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
silence(ggsave("figures/BPP_dists.png"))
The official report mentions the fractions of each group declared unsuitable and less suitable for service. These are 4.7% and 6.8% for the Westerners, and 19.3% and 28.1% for the non-Westerners. We can work out what the cutoffs are by calculating the cumulative sums and matching up the numbers. One can also simply get the specific quantiles from the sample distributions.
#cumsums
d_count$Westerners_cumsum = cumsum(d_count$Westerners)
d_count$Non_Westerners_cumsum = cumsum(d_count$Non_Westerners)
#fractions
d_count$Westerners_cumfrac = d_count$Westerners_cumsum / sum(d_count$Westerners)
d_count$Non_Westerners_cumfrac = d_count$Non_Westerners_cumsum/ sum(d_count$Non_Westerners)
#deduce cutoffs from given numbers
d %>%
filter(group == "Western") %$%
quantile(BPP_score, probs = c(.047, .068)) %T>%
print ->
cutoffs
## 4.7% 6.8%
## 25 27
d %>%
filter(group == "Non-Western") %$%
quantile(BPP_score, probs = c(.193, .281)) %T>%
print ->
cutoffs2
## 19.3% 28.1%
## 23 27
#average cutoffs for best guess
suitability_cutoffs = rbind(cutoffs, cutoffs2) %>% colMeans
#plot
d_count[c("BPP_score", "Westerners_cumfrac", "Non_Westerners_cumfrac")] %>%
gather(key = group, value = fraction, Westerners_cumfrac, Non_Westerners_cumfrac) %>%
ggplot(aes(BPP_score, fraction, color = group)) +
scale_y_continuous("Cumulative %", labels = scales::percent) +
scale_color_discrete(labels = c("Non-Western", "Western")) +
scale_x_continuous(breaks = seq(0, 100, by = 5)) +
geom_line() +
#cut off lines
geom_vline(xintercept = suitability_cutoffs[1], linetype = "dashed", color = "red") +
geom_vline(xintercept = suitability_cutoffs[2], linetype = "dashed", color = "orange") +
theme_bw()
silence(ggsave("figures/cum_fractions.png"))
Interestingly, we find that the given fractions declared suitable are inconsistent with the data, and in one case even produce different cutoffs between the groups! Perhaps the army utilized multiple cutoffs based on subtest scores (e.g. >23 on all subtests), a practice that’s not possible for us to recover with the present data. However, the difference was small in the one case of discrepancy, so we can average them to get it about right.
Plotting the data’s cumultative percentages by group visualizaes the explanation for why there’s relatively more non-Westerners declared unsuitable and less suitable for military service.