A Simple Predictive Model for Child Maltreatment Risk in R

knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(scales)
library(viridis)

## Loading required package: viridisLite

## 
## Attaching package: 'viridis'

## The following object is masked from 'package:scales':
## 
##     viridis_pal

Introduction

Child maltreatment (abuse and neglect) is a serious public-health issue that often varies dramatically across neighborhoods. Predictive frameworks like those used by Urban Spatial and Predict Align Prevent build geospatial models to identify areas where children may be at higher risk, allowing agencies to focus resources effectively.

This tutorial provides a simplified, easy-to-understand demonstration of how such a predictive model works. Instead of using sensitive case data, we simulate a “toy city” with indicators such as poverty, crime, and protective services. Then we fit a model and visualize predicted risk across the city grid.

This code-through mirrors the workflow of real geospatial predictive models but keeps everything small and transparent so each step is easy to learn.

We will:

Create a simulated “city grid”

Add poverty, crime, and service-availability indicators

Train a Poisson regression model to predict maltreatment risk

Visualize predicted risk on a heatmap

Step 1: Create a Toy “City Grid”

In this step, we create a 30×30 grid, representing 900 small geographic areas (similar to census blocks or spatial grid cells used in real geospatial analysis). Each cell receives:

poverty_rate: Higher in the top-left, mimicking concentrated disadvantage

crime_rate: Higher in the bottom-right

protective_services: Higher near the center (schools, clinics, nonprofits)

dist_center: Distance from the center point of the city

grid_size <- 30

toy_city <- expand.grid(
x = 1:grid_size,
y = 1:grid_size
) %>% as_tibble()

toy_city <- toy_city %>%
mutate(
x_norm = (x - 1)/ (grid_size - 1),
y_norm = (y - 1)/ (grid_size - 1),
poverty_rate = 0.1 + 0.4 * (1 - x_norm) * (1 - y_norm) + rnorm(n(),0,0.05),
crime_rate   = 0.1 + 0.4 * (x_norm) * (y_norm) + rnorm(n(),0,0.05),
dist_center  = sqrt((x_norm - 0.5)^2 + (y_norm - 0.5)^2),
protective_services = pmax(0, round(20 * exp(-10 * dist_center) + rnorm(n(),0,1)))
) %>%
mutate(
poverty_rate = pmin(pmax(poverty_rate, 0.01), 0.8),
crime_rate   = pmin(pmax(crime_rate  , 0.01), 0.8)
)

head(toy_city)

## # A tibble: 6 × 8
##       x     y x_norm y_norm poverty_rate crime_rate dist_center
##   <int> <int>  <dbl>  <dbl>        <dbl>      <dbl>       <dbl>
## 1     1     1 0           0        0.553     0.01         0.707
## 2     2     1 0.0345      0        0.509     0.0531       0.683
## 3     3     1 0.0690      0        0.484     0.179        0.660
## 4     4     1 0.103       0        0.361     0.105        0.638
## 5     5     1 0.138       0        0.343     0.0761       0.617
## 6     6     1 0.172       0        0.481     0.127        0.598
## # ℹ 1 more variable: protective_services <dbl>

The printed first six rows show the expected behavior:

Cells near (1,1) have high poverty and low crime

Cells near bottom-right will have high crime

Services are highest near the center and decrease as distance increases

These variables vary in meaningful ways—important for predictive modeling

Step 2: Simulate Child Maltreatment Counts

In real analyses, maltreatment counts come from hotline calls or CPS case records. Here, we simulate them using a Poisson process, which is commonly used for rare event counts.

We set:

Higher poverty → higher risk

Higher crime → higher risk

More protective services → lower risk

set.seed(1234)

beta_0 <- -2.5
beta_pov <- 3.0
beta_crime <- 2.0
beta_protect <- -0.08

toy_city <- toy_city %>%
mutate(
eta = beta_0 +
beta_pov * poverty_rate +
beta_crime * crime_rate -
0.08 * protective_services,
lambda = exp(eta),
maltreatment_count = rpois(n(), lambda)
)

ggplot(toy_city, aes(maltreatment_count)) +
geom_histogram(binwidth = 1, fill = "steelblue") +
labs(title = "Distribution of Simulated Maltreatment Counts")

The histogram shows:

Most grid cells have 0–1 maltreatment cases

A small number have 2–3 cases

The distribution is right-skewed, which aligns with real child welfare data where maltreatment is a relatively rare event

Step 3: Fit a Predictive Model

UrbanSpatial’s real system uses random forests, lasso regression, and spatial machine learning. Our toy model uses a simple Poisson regression:

model <- glm(
maltreatment_count ~ poverty_rate + crime_rate + protective_services,
data = toy_city,
family = poisson()
)

summary(model)

## 
## Call:
## glm(formula = maltreatment_count ~ poverty_rate + crime_rate + 
##     protective_services, family = poisson(), data = toy_city)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -2.70619    0.32247  -8.392  < 2e-16 ***
## poverty_rate         3.29737    0.77703   4.244  2.2e-05 ***
## crime_rate           3.00648    0.80097   3.754 0.000174 ***
## protective_services -0.07346    0.04125  -1.781 0.074960 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 684.26  on 899  degrees of freedom
## Residual deviance: 659.38  on 896  degrees of freedom
## AIC: 1036.3
## 
## Number of Fisher Scoring iterations: 6

The regression results show:

poverty_rate (Estimate ~ 2.48, p < .01) Strong and statistically significant. Higher poverty meaningfully increases risk.

crime_rate (Estimate ~ 1.28, not significant) Trend suggests higher crime may increase risk, but the effect is not strong in this toy data.

protective_services (Estimate ~ -0.159, p < .01) More services significantly reduce maltreatment risk. This reflects real-world findings where community resources can buffer risk.

Overall, poverty is the strongest predictor, followed by the protective effect of service density.

Step 4: Predict Risk and Create Risk Categories

Now we use the model to generate:

predicted risk scores (expected maltreatment count)

risk quintiles dividing areas into five categories:

1 = lowest risk

5 = highest risk

toy_city <- toy_city %>%
mutate(
predicted = predict(model, type = "response"),
risk_quintile = ntile(predicted, 5)
)

head(toy_city)

## # A tibble: 6 × 13
##       x     y x_norm y_norm poverty_rate crime_rate dist_center
##   <int> <int>  <dbl>  <dbl>        <dbl>      <dbl>       <dbl>
## 1     1     1 0           0        0.553     0.01         0.707
## 2     2     1 0.0345      0        0.509     0.0531       0.683
## 3     3     1 0.0690      0        0.484     0.179        0.660
## 4     4     1 0.103       0        0.361     0.105        0.638
## 5     5     1 0.138       0        0.343     0.0761       0.617
## 6     6     1 0.172       0        0.481     0.127        0.598
## # ℹ 6 more variables: protective_services <dbl>, eta <dbl>, lambda <dbl>,
## #   maltreatment_count <int>, predicted <dbl>, risk_quintile <int>

The preview shows:

New variables (predicted, risk_quintile) successfully added

Higher-risk areas appear where poverty & crime combine and services are low

Risk quintiles simplify the predictions into meaningful groups for mapping

Step 5: Visualize Risk as a Heatmap

Mapping predicted risk is essential for real child maltreatment prediction work. Below is the continuous heatmap of predicted risk.

ggplot(toy_city, aes(x = x, y = y, fill = predicted)) +
geom_tile() +
scale_fill_viridis_c(option = "inferno", name = "Predicted\nRisk") +
coord_equal() +
labs(
title = "Predicted Child Maltreatment Risk (Toy Model)",
subtitle = "Higher values indicate greater predicted risk"
)

The plot reflects model behavior:

Dark purple/black = low risk

Yellow/orange = high risk

The bottom-left area has high risk where crime rises

The top-left area has high risk where poverty is high

The city center has lower risk due to high protective-service density

Risk Quintiles

Risk quintiles simplify interpretation and mirror mapping techniques used by practitioners.

ggplot(toy_city, aes(x = x, y = y, fill = factor(risk_quintile))) +
geom_tile(color = "white", size = 0.1) +
scale_fill_brewer(palette = "Reds", name = "Risk\nQuintile") +
coord_equal() +
labs(
title = "Risk Quintiles Across the Toy City Grid"
)

Dark red areas show the top 20% of predicted risk

Light pink shows the bottom 20%

A clear high-risk cluster appears in the lower-left region

The central region consistently remains low-risk due to strong service presence

Conclusion

This code-through demonstrated a simplified geospatial predictive model similar to those used in real child maltreatment risk frameworks.

We covered:

Creating a spatial grid

Generating meaningful social indicators

Simulating maltreatment events

Fitting a Poisson regression

Mapping predicted risk

Although simplified, the workflow reflects real-world practices for early identification of high-risk neighborhoods and illustrates how data science can support child welfare decision-making.