knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(scales)
library(viridis)
## Loading required package: viridisLite
##
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
##
## viridis_pal
Child maltreatment (abuse and neglect) is a serious public-health issue that often varies dramatically across neighborhoods. Predictive frameworks like those used by Urban Spatial and Predict Align Prevent build geospatial models to identify areas where children may be at higher risk, allowing agencies to focus resources effectively.
This tutorial provides a simplified, easy-to-understand demonstration of how such a predictive model works. Instead of using sensitive case data, we simulate a “toy city” with indicators such as poverty, crime, and protective services. Then we fit a model and visualize predicted risk across the city grid.
This code-through mirrors the workflow of real geospatial predictive models but keeps everything small and transparent so each step is easy to learn.
We will:
Create a simulated “city grid”
Add poverty, crime, and service-availability indicators
Train a Poisson regression model to predict maltreatment risk
Visualize predicted risk on a heatmap
In this step, we create a 30×30 grid, representing 900 small geographic areas (similar to census blocks or spatial grid cells used in real geospatial analysis). Each cell receives:
poverty_rate: Higher in the top-left, mimicking concentrated disadvantage
crime_rate: Higher in the bottom-right
protective_services: Higher near the center (schools, clinics, nonprofits)
dist_center: Distance from the center point of the city
grid_size <- 30
toy_city <- expand.grid(
x = 1:grid_size,
y = 1:grid_size
) %>% as_tibble()
toy_city <- toy_city %>%
mutate(
x_norm = (x - 1)/ (grid_size - 1),
y_norm = (y - 1)/ (grid_size - 1),
poverty_rate = 0.1 + 0.4 * (1 - x_norm) * (1 - y_norm) + rnorm(n(),0,0.05),
crime_rate = 0.1 + 0.4 * (x_norm) * (y_norm) + rnorm(n(),0,0.05),
dist_center = sqrt((x_norm - 0.5)^2 + (y_norm - 0.5)^2),
protective_services = pmax(0, round(20 * exp(-10 * dist_center) + rnorm(n(),0,1)))
) %>%
mutate(
poverty_rate = pmin(pmax(poverty_rate, 0.01), 0.8),
crime_rate = pmin(pmax(crime_rate , 0.01), 0.8)
)
head(toy_city)
## # A tibble: 6 × 8
## x y x_norm y_norm poverty_rate crime_rate dist_center
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0 0 0.553 0.01 0.707
## 2 2 1 0.0345 0 0.509 0.0531 0.683
## 3 3 1 0.0690 0 0.484 0.179 0.660
## 4 4 1 0.103 0 0.361 0.105 0.638
## 5 5 1 0.138 0 0.343 0.0761 0.617
## 6 6 1 0.172 0 0.481 0.127 0.598
## # ℹ 1 more variable: protective_services <dbl>
The printed first six rows show the expected behavior:
Cells near (1,1) have high poverty and low crime
Cells near bottom-right will have high crime
Services are highest near the center and decrease as distance increases
These variables vary in meaningful ways—important for predictive modeling
In real analyses, maltreatment counts come from hotline calls or CPS case records. Here, we simulate them using a Poisson process, which is commonly used for rare event counts.
We set:
Higher poverty → higher risk
Higher crime → higher risk
More protective services → lower risk
set.seed(1234)
beta_0 <- -2.5
beta_pov <- 3.0
beta_crime <- 2.0
beta_protect <- -0.08
toy_city <- toy_city %>%
mutate(
eta = beta_0 +
beta_pov * poverty_rate +
beta_crime * crime_rate -
0.08 * protective_services,
lambda = exp(eta),
maltreatment_count = rpois(n(), lambda)
)
ggplot(toy_city, aes(maltreatment_count)) +
geom_histogram(binwidth = 1, fill = "steelblue") +
labs(title = "Distribution of Simulated Maltreatment Counts")
The histogram shows:
Most grid cells have 0–1 maltreatment cases
A small number have 2–3 cases
The distribution is right-skewed, which aligns with real child welfare data where maltreatment is a relatively rare event
UrbanSpatial’s real system uses random forests, lasso regression, and spatial machine learning. Our toy model uses a simple Poisson regression:
model <- glm(
maltreatment_count ~ poverty_rate + crime_rate + protective_services,
data = toy_city,
family = poisson()
)
summary(model)
##
## Call:
## glm(formula = maltreatment_count ~ poverty_rate + crime_rate +
## protective_services, family = poisson(), data = toy_city)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.70619 0.32247 -8.392 < 2e-16 ***
## poverty_rate 3.29737 0.77703 4.244 2.2e-05 ***
## crime_rate 3.00648 0.80097 3.754 0.000174 ***
## protective_services -0.07346 0.04125 -1.781 0.074960 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 684.26 on 899 degrees of freedom
## Residual deviance: 659.38 on 896 degrees of freedom
## AIC: 1036.3
##
## Number of Fisher Scoring iterations: 6
The regression results show:
poverty_rate (Estimate ~ 2.48, p < .01) Strong and statistically significant. Higher poverty meaningfully increases risk.
crime_rate (Estimate ~ 1.28, not significant) Trend suggests higher crime may increase risk, but the effect is not strong in this toy data.
protective_services (Estimate ~ -0.159, p < .01) More services significantly reduce maltreatment risk. This reflects real-world findings where community resources can buffer risk.
Overall, poverty is the strongest predictor, followed by the protective effect of service density.
Now we use the model to generate:
predicted risk scores (expected maltreatment count)
risk quintiles dividing areas into five categories:
1 = lowest risk
5 = highest risk
toy_city <- toy_city %>%
mutate(
predicted = predict(model, type = "response"),
risk_quintile = ntile(predicted, 5)
)
head(toy_city)
## # A tibble: 6 × 13
## x y x_norm y_norm poverty_rate crime_rate dist_center
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0 0 0.553 0.01 0.707
## 2 2 1 0.0345 0 0.509 0.0531 0.683
## 3 3 1 0.0690 0 0.484 0.179 0.660
## 4 4 1 0.103 0 0.361 0.105 0.638
## 5 5 1 0.138 0 0.343 0.0761 0.617
## 6 6 1 0.172 0 0.481 0.127 0.598
## # ℹ 6 more variables: protective_services <dbl>, eta <dbl>, lambda <dbl>,
## # maltreatment_count <int>, predicted <dbl>, risk_quintile <int>
The preview shows:
New variables (predicted, risk_quintile) successfully added
Higher-risk areas appear where poverty & crime combine and services are low
Risk quintiles simplify the predictions into meaningful groups for mapping
Mapping predicted risk is essential for real child maltreatment prediction work. Below is the continuous heatmap of predicted risk.
ggplot(toy_city, aes(x = x, y = y, fill = predicted)) +
geom_tile() +
scale_fill_viridis_c(option = "inferno", name = "Predicted\nRisk") +
coord_equal() +
labs(
title = "Predicted Child Maltreatment Risk (Toy Model)",
subtitle = "Higher values indicate greater predicted risk"
)
The plot reflects model behavior:
Dark purple/black = low risk
Yellow/orange = high risk
The bottom-left area has high risk where crime rises
The top-left area has high risk where poverty is high
The city center has lower risk due to high protective-service density
Risk quintiles simplify interpretation and mirror mapping techniques used by practitioners.
ggplot(toy_city, aes(x = x, y = y, fill = factor(risk_quintile))) +
geom_tile(color = "white", size = 0.1) +
scale_fill_brewer(palette = "Reds", name = "Risk\nQuintile") +
coord_equal() +
labs(
title = "Risk Quintiles Across the Toy City Grid"
)
Dark red areas show the top 20% of predicted risk
Light pink shows the bottom 20%
A clear high-risk cluster appears in the lower-left region
The central region consistently remains low-risk due to strong service presence
This code-through demonstrated a simplified geospatial predictive model similar to those used in real child maltreatment risk frameworks.
We covered:
Creating a spatial grid
Generating meaningful social indicators
Simulating maltreatment events
Fitting a Poisson regression
Mapping predicted risk
Although simplified, the workflow reflects real-world practices for early identification of high-risk neighborhoods and illustrates how data science can support child welfare decision-making.