This is now on: https://rpubs.com/friendly/propensity
In his 1831 Research on the Propensity for Crime and Different Ages, Quetelet presents the data below as Table 13 (p. 57) giving the numbers of men and women accused of crime according to age groups, and then measures of “degrees of the propensity for crime”.
table13 <- read_table("https://www.dropbox.com/s/shi1g3hx1z34i8o/Quetelet-table13.dat?raw=1",
show_col_types = FALSE)
table13
## # A tibble: 14 x 8
## age_gp m_acc w_acc w_m prop_all prop_m prop_w calculated
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 <16 438 82 187 0.02 0.02 0.02 0.02
## 2 16-21 3901 726 186 0.76 0.79 0.64 0.66
## 3 21-25 3762 845 225 1 1 0.98 1
## 4 25-30 4260 1017 239 0.97 0.96 1 0.92
## 5 30-35 3254 782 240 0.81 0.8 0.83 0.81
## 6 35-40 2105 621 295 0.59 0.56 0.75 0.71
## 7 40-45 1831 468 256 0.55 0.54 0.6 0.6
## 8 45-50 1357 363 267 0.46 0.44 0.51 0.51
## 9 50-55 896 203 227 0.33 0.33 0.33 0.42
## 10 55-60 555 113 204 0.24 0.24 0.22 0.34
## 11 60-65 445 97 218 0.24 0.24 0.23 0.27
## 12 65-70 230 45 196 0.16 0.17 0.14 0.21
## 13 70-80 163 38 233 0.12 0.12 0.12 0.12
## 14 >80 18 1 56 0.05 0.06 0.01 0.04
He presents the plot below of the calculated column against age (Fig. 4, Pl III): The first question is, how did he calculate this?
The variables in this table are:
| Variable | description |
|---|---|
age_gp |
age group |
m_acc |
number of men accused, 1826-1827 |
w_acc |
number of women accused |
w_m |
number of women for 1000 men |
prop_all |
degrees of propensity for crime: in general |
prop_m |
… men |
prop_w |
… women |
calculated |
… calculated by the formula |
He says (p. 57), “The last column offers results calculated by this very simple empirical formula”:
\[ Y = (1 - \sin{X})\frac{1}{1 + m} \quad\quad\mbox{ supposing }\quad\quad m = \frac{1}{2^X - 18}\]
First, make a numeric variable,age using midpoints of the age intervals, mostly of width 5:
table13$age_gp
## [1] "<16" "16-21" "21-25" "25-30" "30-35" "35-40" "40-45" "45-50" "50-55"
## [10] "55-60" "60-65" "65-70" "70-80" ">80"
# make a quantitative age variable
age <- c(14, 18, 23, 27, 32, 37, 42, 47, 52, 57, 62, 67, 75, 82)
table13 <- cbind(age, table13)
I’ll plot both the data propensity variable, prop_all, so transform the data to a long format:
#' transform to long
table13 |>
select(-c(age_gp, m_acc, w_acc, w_m)) |>
gather(prop_all:calculated,
key = "measure",
value = "propensity") -> table13_long
Plot the overall propensity and the calculated one, reproducing Quetelet’s graph. Quetelet notes that propensity crime reaches a peak around 25 years of age.
I’m plotting the data points as well as loess smoothed curves because Quetelet obviously did some smoothing. He that the calculated curve matches the data, at least approximately. We can see that it matches quite well up to age 30, but not in the higher age groups.
#' Reproduce Quetelet's graph
table13_long |>
filter(measure %in% c("prop_all", "calculated")) |>
ggplot(
aes(x=age, y=propensity, color=measure)) +
geom_smooth(method="loess", formula = y~x, se=FALSE, size=1.25) +
geom_point(size = 1.5) +
labs(x = "Age",
y = "Propensity to Crime",
color = "Curve") +
scale_color_manual(labels = c("calculated", "data"),
values = c("blue", "red")) +
theme_bw(base_size = 16) +
theme(legend.position = c(0.75, 0.75))
I don’t understand how his calculated variable comes from the formula. He says:
It is necessary to take, as one sees, for the axis of the abscissas, the quarter of the circumference rectified and divided according to decimal division.
I don’t see how this is reflected in the formula. A plot of what this formula against age gives an expected sinusoid, with a period that expands with age:
m <- 1/ (2^age - 18)
formula <- 1 - (sin(age) / (1+m))
plot(age, formula,
type = "b")
However, a plot of the formula against his calculated values is perplexing:
par(xpd = TRUE)
plot(table13$calculated, formula,
type = "b",
xlab = "Quetelet calculated")
text(x = table13$calculated,
y = formula,
labels = age, pos = 3)
A kind reader of StackOverflow pointed out that (a) the parentheses in my formula were wrong, and (b) Quetelet probably used radians. With this, I get:
m <- 1/ (2^age - 18)
formula <- 1 - (sin(age * pi / 180)) / (1+m) # radians
plot(age, formula,
type = "b")
The graph now looks reasonable, except for the two lowest points:
par(xpd = TRUE)
plot(table13$calculated, formula,
type = "b",
xlab = "Quetelet calculated")
text(x = table13$calculated,
y = formula,
labels = age, pos = 3)
A second puzzle is the question of how Quetelet calculated the “degrees of propensity” in Table 13. The data is provided in his Table 12:
table12 <- read_csv("https://www.dropbox.com/s/pca1k5aqhx48iuv/Quetelet-table12.csv?raw=1",
show_col_types = FALSE)
head(table12)
## # A tibble: 6 x 6
## age_gp pers_crime prop_crime pct_prop population propensity
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 <16 80 440 85 3304 161
## 2 16-21 904 3723 80 887 5217
## 3 21-25 1278 3329 72 673 6846
## 4 25-30 1575 3702 70 791 6671
## 5 30-35 1153 2883 71 732 5514
## 6 35-40 650 2076 76 672 4057
This records the number of crimes against persons (pers_crime) and of crimes against property (prop_crime) committed in France by both sexes in the years 1826-1829. The fourth column (pct_prop) is the percent of crimes against property out of 100 crimes of both types.
The column population, headed “Population according to ages” indicates “how a population of 10,000 souls is divided up in France according to ages. Of the last column (propensity), he says this:
indicates the relationship of the total number of crimes to the corresponding number in the preceeding column
The question is how this relates to the column prop_all in Table 13.