Real Estate Clustering

Justin Dallmann

8/29/2017

Overview

Real estate agents often price homes by reference to “comparable” properties.

Traditionally, this is done by narrowing the search to a real estate region (as identified by city zoning or a real estate map) and looking for homes with comparable features within that area.

My real estate app at:

https://jdallmann.shinyapps.io/comparablesapp/

finds comparable properties by clustering on relevant features at the outset.

Overview

This means that the app clusters properties naturally, using a property’s location and a user’s choice of year built, living area, frontage, and whether it has a garage.

Note that this is independant from the conventional ways that properties are picked out by city area, offering a better picture of which properties are actually “comparable”.

Additionally, statistics on each cluster region are provided for reference.

The benefits

That these features matter to finding true comparables for the purposes of pricing can be demonstrated by building a linear model of how the selling price depends on the listed features. Each feature is indispensable for explaining model variance:

fit <- lm(sell.price ~., data=cleanFrame)
anova(fit)
| Analysis of Variance Table
| 
| Response: sell.price
|                 Df Sum Sq Mean Sq F value    Pr(>F)    
| garage           1  86.72   86.72  407.57 < 2.2e-16 ***
| living.area.M2   1 391.31  391.31 1839.15 < 2.2e-16 ***
| frontage.M       1  24.82   24.82  116.67 < 2.2e-16 ***
| year.built       1  52.21   52.21  245.37 < 2.2e-16 ***
| Residuals      700 148.94    0.21                      
| ---
| Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model of the most linearly significant predictor

Sample output