Question

Can we train a machine learning model to predict building fires based on satellite imagery alone?

1. Design

Using data from Portland, Oregon (historical fire incidents and 2016 property records), I identified 350 commercial properties that reported fires between 2012 and 2016, then randomly selected 350 commercial properties that reported no fires (the control group). I split each group into 300 training and 50 testing locations. Commercial buildings are of particular interest to fire departments because they are authorized–often statutorily required–to inspect these buildings.

Input images

For each location I downloaded high-resolution satellite images using the Google Static Maps API. While precise capture dates for these images are not available, they are routinely taken within the past year. (Thus, these photos were taken after fires did/did not occur, and it is conceivable that evidence of the fires would be visible in the images–however, manual inspection of the images suggests otherwise.) Zoom levels corresponded to the approximate boundaries of typical parcels; this test used two zoom levels per property and found that a tighter zoom contributed more to the prediction model.

Feature extraction

I extracted visual features data from each image using VGG16, a convolutional neural network trained to assign images to 1000 different categories, as diverse and specific as “vulture,” “Granny Smith” and “moped.” Rather than allowing the model to classify the satellite image, we removed the final layer from the model and returned a vector of 4096 numeric values for each image. This vector is a highly abstracted representation of the original image, encoding attributes (e.g. color, texture, shape) that are relevant to the classification task.

Estimation model

After converting images from each location to a long numeric vector, I used the training locations to fit a ridge classiier. (This machine learning model adapts logistic regression to prediction tasks by penalizing beta coefficients to reduce overfitting.) I used this model to predict outcomes at the 100 testing locations (50 with fire/50 without fire).

2. Results

The model predicted fires substantially better than random chance: it correctly predicted the outcome for 72% of locations. The model was most accurate at the low and high ends of its range; properties scoring in the middle were a toss-up:



On visual inspection, the model appears to have estimated higher risk at larger properties and lower risk at smaller properties.

Properties with highest estimated risk


Properties with lowest estimated risk

Image source: Google Static Maps

This association between commercial building size and fire risk has been observed in each published study using property records to predict property-level fire risk (Atlanta Firebird, Pittsburgh Burgh’s Eye View) and in my research in Portland (Jay & Hemenway, forthcoming). In this sample the association holds as well. However, the model estimates (AUC = 0.8) predicts test set outcomes more accurately than building square footage alone (AUC = 0.72), in part because square footage data were missing for 23% of the commercial properties.

3. Discussion

This exploratory project used a remarkably small dataset (350 positive cases) to train a reasonably accurate and informative prediction model. Larger datasets, combining incidents from multiple jurisdictions, would unlock more of the capabilities of convolutional neural networks. We would expect, therefore, substantial improvements from a larger effort to model fire risk on satellite imagery.

As an added bonus, unlike raw numerical data, satellite imagery is likely to be familiar and interpretable to most firefighters and other fire prevention personnel. A platform built on these data could pose fewer challenges in communicating risk to the frontline personnel whose daily work will be influenced by a risk prediction model.

The next generation of fire prediction models?

These findings suggest potential for the next generation of fire prediction models. The first generation of property-level fire prediction projects in U.S. cities (i.e. Atlanta, Pittsburgh, Portland, New York City) has used property records to train machine learning models. The accuracy benchmark for commercial properties, of AUC = 0.8 or just below, appears clearly established (including for Portland, forthcoming).

In comparison, satellite images display the most important information from first-generation models (building size and condition, indicators of business type) as well as information that has not been included in first-generation projects (e.g. parking lot size, condition and proximity of nearby lots, distance between building and street, roadway size) and which may correlate with fire risk. Imagery data, therefore, could likely enhance predictive models in the handful of U.S. cities already using property records to estimate risk.

More interesting, perhaps, is the possibility that satellite imagery could leapfrog records-based predictions as the best choice for the thousands of jurisdictions that do not use fire risk to prioritize inspections and other fire prevention strategies. While we have observed in Portland that even a simple set of records-derived variables can approximate the more in-depth predictor datasets, this approach nonetheless requires city-by-city customization. In contrast, a satellite-based model might be able to identify enough common risk factors as to perform well across jurisdictions without any local data–even fire data would only be required for validation purposes.

These prospects could benefit significantly from the growing community of fire data analysts and growing interest in community risk reduction among U.S. fire departments. Even with the dearth of federal funding for fire prevention research, a consortium of local fire departments could readily collaborate to develop a powerful, flexible prediction model using satellite imagery.