Experiment 1 (Detroit housing): 87% accuracy

Jonathan Jay
jonjay@mail.harvard.edu

Research question

Can we train deep learning models to identify distressed housing using street-level images?

Design

I found physically distressed properties using demolition records from the City of Detroit and non-distressed housing using City of Detroit parcel records. I chose 1600 residential properties that were demolished within one year of being photographed for Google Street View and 1600 “control” residential properties in a typical range of assessed building values (USD 6,300-25,000). I randomly reserved 200 of each category to test the model’s accuracy, and used the rest to train and tune the model.

Input images

Here are some typical demolitions properties:

And some of the normal properties:

The model

The model is a convolutional neural network (“CNN” or “ConvNet”), the same type of algorithm used for recognizing faces in Facebook photos. This one modifies VGG16, a deep learning model originally built to classify a wide range of image types (e.g. dogs, birds, flowers).

Results

Overall the model correctly classified 87% of the images. The steep initial rise in the ROC curve shows that the model is highly accurate in its most-confident estimates: it correctly identifies about 100 of the distressed properties before getting any wrong.

Here are some of the 349 test images it classified correctly, where a score over 0.5 counts as a “distress” prediction:

And some of the 51 it misclassified:

Comments

This experiment demonstrates that street level images can serve as low-cost property surveys and that ConvNets can learn to detect physical distress. The strategy could provide a quick, low-cost assessment of city buildings and help pinpoint the most unsafe and unhealthy living conditions for city residents.

The limitations of this particular experiment highlight the potential for the general approach. This model achieved a useful level of predictive accuracy using a fairly small, imperfect dataset of distressed properties. With a larger dataset training a more complex ConvNet, we would expect even better accuracy.

While some of the model’s mistakes are surprising, it also appears to learn some non-obvious patterns: for example, it seems to predict demolition not only when tree branches are overgrown, but when a tree very close to a house grows too tall. I had noticed this pattern as I looked through the training images, but it might not otherwise have occurred to me as an important clue that a house could be vacant or abandoned.

My ongoing research aims to turn these findings into real-world tools. The next step for distressed housing detection is to train a model using images of distressed and non-distressed properties from places that look different from Detroit. Like humans, ConvNets benefit from a larger, more diverse set of learning opportunities; the right training could produce a model flexible enough to identify distressed housing in any city or region.