Crepon et al (2013, QJE): seemingly effective job training programs may actually be doing no social good if they only help participants get jobs at the expense of non-participants.
Generate 100 agents with locations [0, 100] and search radiuses [1, 10]. A job appears on [0, 100] and if the job is within the agent’s radius, the agent applies. One applicant is hired at random, and that job searcher leaves the pool. 50 jobs are distributed in this way.
Agents and jobs are distributed in the same way as in Simulation A, but now x/100 of the agents are participants in the job training program and as a result, their radiuses double. For low x, the job training program seems very effective because it boosts the probability that a participant will be hired (at the expense of the probability a non-participant will be hired). But as x increases, the job training program appears to be less effective.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tictoc)
# Simulation A:
agents <- tibble(
agent_id = 1:100,
location = runif(n = 100, min = 0, max = 100),
search_radius = runif(n = 100, min = 1, max = 10),
min = location - search_radius,
max = location + search_radius,
employed = rep(0, 100)
)
job <- runif(n = 1, min = 0, max = 100)
hire <- function(agents, job) {
# Applicant pool
pool <- agents %>%
filter(min <= job, max >= job, employed == 0)
if (nrow(pool) == 0) {
# If no one is in the pool of applicants, the job is awarded randomly out of
# the unemployed
agents %>%
filter(employed == 0) %>%
slice_sample() %>%
mutate(employed = 1) %>%
# Add the new employment data into the agent data frame
bind_rows(agents) %>%
distinct(agent_id, .keep_all = TRUE)
} else {
pool %>%
slice_sample() %>%
mutate(employed = 1) %>%
bind_rows(agents) %>%
distinct(agent_id, .keep_all = TRUE)
}
}
hire(agents, job)
## # A tibble: 100 × 6
## agent_id location search_radius min max employed
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10 59.8 8.15 51.7 68.0 1
## 2 1 44.9 9.37 35.5 54.2 0
## 3 2 9.49 4.96 4.53 14.4 0
## 4 3 19.1 4.77 14.4 23.9 0
## 5 4 29.0 5.40 23.6 34.4 0
## 6 5 83.6 2.28 81.3 85.9 0
## 7 6 49.4 3.73 45.7 53.1 0
## 8 7 81.4 6.94 74.4 88.3 0
## 9 8 27.1 6.25 20.9 33.4 0
## 10 9 35.7 9.25 26.5 45.0 0
## # … with 90 more rows
reduce(
# 50 jobs
.x = runif(n = 50, min = 0, max = 100),
.f = hire,
.init = agents
) %>%
count(employed)
## # A tibble: 2 × 2
## employed n
## <dbl> <int>
## 1 0 50
## 2 1 50
Since there are 100 people and 50 jobs were given out, each agent had a 50% chance of getting a job (not conditional on their search radius or location).
# Simulation B:
n_participants <- 10
simB_iteration <- function(n_participants) {
agents <- tibble(
agent_id = 1:100,
participant = c(rep(1, n_participants), rep(0, 100 - n_participants)),
location = runif(n = 100, min = 0, max = 100),
# Job training program participants get their search radiuses doubled:
search_radius = runif(n = 100, min = 1, max = 10) * (participant + 1),
min = location - search_radius,
max = location + search_radius,
employed = rep(0, 100)
)
reduce(
# 50 jobs
.x = runif(n = 50, min = 0, max = 100),
.f = hire,
.init = agents
) %>%
count(participant, employed) %>%
filter(employed == 1) %>%
mutate(
total = if_else(participant == 1, n_participants, 100 - n_participants),
prob_hired = n/total
) %>%
select(participant, total, prob_hired)
}
simB_iteration(25)
## # A tibble: 2 × 3
## participant total prob_hired
## <dbl> <dbl> <dbl>
## 1 0 75 0.453
## 2 1 25 0.64
n_participants <- seq(from = 5, to = 100, by = 5) %>%
rep(each = 10)
tic()
results <- map_dfr(n_participants, simB_iteration)
toc()
## 168.395 sec elapsed
This simulation takes quite a while and it only distributes 50 jobs 200 times.
results %>%
filter(participant == 1) %>%
mutate(participant = as.factor(participant)) %>%
ggplot(aes(x = total, y = prob_hired)) +
geom_jitter() +
geom_smooth() +
ggtitle("Probability of Being Hired for Participants") +
xlab("Program Size") +
ylab("Probability of Being Hired")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
If you’re a participant, you want the job training program to be small, because that way, the probability you’re hired will be boosted by the largest amount. As the job training program increases in size, its benefit to participants diminishes.
results %>%
filter(participant == 0) %>%
mutate(participant = as.factor(participant), total = 100 - total) %>%
ggplot(aes(x = total, y = prob_hired)) +
geom_jitter() +
geom_smooth() +
ggtitle("Probability of Being Hired for Non-Participants") +
xlab("Program Size") +
ylab("Probability of Being Hired")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
If you’re not a participant, you want the job training program to be as small as possible. As the program grows, more people are put in line before you to get a job.
A graph network solution might look like this: much more elegant and hopefully I could find a way to make it quicker.
# n_participants := 25;
# Create 100 agents with properties:
# location := rand_int(1, 100),
# participant := c(rep(1, n_participants), rep(0, 100 - n_participants)),
# radius := rand_int(1, 10) * (participant + 1),
# applies := (location - radius):(location + radius),
# employed := 0;
# Create 50 jobs with properties:
# location := r(1, 10),
# applicant := any agent where job.location in agents.applies and hired is 0,
# hire := sample 1 applicant
# mutate job.hire.employed := 1; # traverses through the job "hired" edge to
# # the applicant who was hired and changes
# # their employment status
# Count agents.hired and agents.participant