This technical report evaluates the feasibility and effectiveness of density-based spatial clustering methods, specifically DBSCAN, for objectively delineating real estate market areas in St. Tammany Parish, Louisiana. According to the International Association of Assessing Officers (IAAO), a market area is the geographic region from which demand originates and where competing properties are located. The primary research question is: How effectively can DBSCAN identify localized real estate market areas to enhance assessor workflows?
Using real estate transaction data from 2024: including sale price, geographic coordinates, property characteristics, and transaction dates, this study demonstrates clustering applications exclusively using R. The effectiveness of DBSCAN will be evaluated primarily for spatial coherence and alignment with known market behaviors through qualitative assessment and descriptive statistics. This report outlines a practical framework for integrating density-based clustering techniques into assessor workflows.
Current methods for defining real estate market areas often rely on subjective appraiser judgments or static boundaries such as subdivision lines, potentially overlooking true market dynamics. Such subjectivity can introduce inconsistencies and inaccuracies in property valuation.
The research question guiding this study is: To what extent does DBSCAN enhance the delineation of real estate market areas compared to current assessor-defined boundaries?
DBSCAN offers a data-driven approach that adjusts dynamically based on actual sales activity. This technique effectively manages varying densities and outliers, potentially providing more accurate market delineations than traditional static methods (Schubert et al., 2017). This study evaluates its practical feasibility and effectiveness in aligning with IAAO standards.
The geographic scope is St. Tammany Parish, Louisiana, utilizing transaction data from 2024 to capture contemporary market dynamics. A literature review contextualizes this study within existing GIS and property valuation research, addressing the practical gap in integrating spatial clustering techniques into property valuation workflows.
Based on feedback from the instructor and peers, the following changes were made:
The scope was narrowed to focus exclusively on DBSCAN.
The methodology section was expanded to clarify parameter selection and preprocessing steps.
The literature review was refined to emphasize studies related to real estate valuation applications.
Feedback on data availability was incorporated, and a discussion of data limitations was added.
One comment suggested incorporating an alternative clustering method (e.g., K-Means) for comparison. However, this was not pursued due to the project’s goal of demonstrating DBSCAN’s practical usability for assessors rather than a comparative clustering study and would’ve taken too much time to integrate in a timely fashion.
Previous research has explored spatial clustering techniques in real estate valuation and geographic analysis. Schubert et al. (2017) discuss the robustness of DBSCAN in handling spatial data and its effectiveness in distinguishing meaningful clusters from noise. Farhan and Murray (2005) demonstrated the potential of GIS-based market area delineation for urban planning applications, suggesting that spatial clustering can enhance traditional approaches.
Han (2005) examined spatial clustering in urban property values, highlighting the influence of geographic location on valuation patterns. Similarly, Yoshimura et al. (2021) analyzed spatial clustering in retail sales, illustrating how density-based methods can reveal economic trends. These studies collectively support the notion that DBSCAN can provide an objective and adaptable framework for defining real estate market areas.
This study uses parcel-level data from St. Tammany Parish, including detailed sales transactions from 2020 to 2024, sourced from parcel records, sales transaction databases, and GIS layers.
##3.1 Datasets and Their Roles
• Parcel Data (Shapefile or SQL Database): Spatial boundaries for geographic referencing. • Sales Transaction Data: Includes sale price, transaction date, year built, square footage, and depreciation year. • Geographic Coordinates (Lat/Long): Spatial input for clustering.
##3.2 Variables and Their Roles
• Sale Price: Key metric for valuation trends. • Latitude/Longitude: Spatial clustering basis. • Sales Date: Temporal analysis of market changes. • Year Built: Assessment of age-related valuation patterns. • Square Footage: Additional refinement based on property size. • Depreciation Year: Evaluates valuation trends influenced by property age.
##3.3 Preprocessing and Transformation
• Managing missing data via imputation or exclusion. • Normalizing numerical variables (sale price, square footage, year built). • Filtering outliers to avoid distortion. • Performing spatial joins for accurate geographic referencing.
##3.4 Data Limitations
• All variable haven’t been collected from the SQL Database yet • Differing professional opinions at the Assessors on what is an appropriate Epsilon and MinPoint. • Temporal Datasets are still being looked into.
library(sf)
library(dbscan)
library(ggplot2)
library(dplyr)
library(tmap)
library(cluster)
library(Rtsne)
library(parallelDist)
# Load parcel centroids
data_path <- "M:/SCHOOL/PennState/GEO_588_2025/TERM_PROJECT/DATA/PARCELS_2024_Centroid.shp"
parcels_centroids <- st_read(data_path)
## Reading layer `PARCELS_2024_Centroid' from data source
## `M:\SCHOOL\PennState\GEO_588_2025\TERM_PROJECT\DATA\PARCELS_2024_Centroid.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 131996 features and 9 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 3620122 ymin: 609695.6 xmax: 3836715 ymax: 805091.4
## Projected CRS: NAD83 / Louisiana South (ftUS)
# Check coordinate reference system
print(st_crs(parcels_centroids))
## Coordinate Reference System:
## User input: NAD83 / Louisiana South (ftUS)
## wkt:
## PROJCRS["NAD83 / Louisiana South (ftUS)",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]],
## CONVERSION["SPCS83 Louisiana South zone (US survey foot)",
## METHOD["Lambert Conic Conformal (2SP)",
## ID["EPSG",9802]],
## PARAMETER["Latitude of false origin",28.5,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8821]],
## PARAMETER["Longitude of false origin",-91.3333333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8822]],
## PARAMETER["Latitude of 1st standard parallel",30.7,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8823]],
## PARAMETER["Latitude of 2nd standard parallel",29.3,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8824]],
## PARAMETER["Easting at false origin",3280833.3333,
## LENGTHUNIT["US survey foot",0.304800609601219],
## ID["EPSG",8826]],
## PARAMETER["Northing at false origin",0,
## LENGTHUNIT["US survey foot",0.304800609601219],
## ID["EPSG",8827]]],
## CS[Cartesian,2],
## AXIS["easting (X)",east,
## ORDER[1],
## LENGTHUNIT["US survey foot",0.304800609601219]],
## AXIS["northing (Y)",north,
## ORDER[2],
## LENGTHUNIT["US survey foot",0.304800609601219]],
## USAGE[
## SCOPE["Engineering survey, topographic mapping."],
## AREA["United States (USA) - Louisiana - counties of Acadia; Allen; Ascension; Assumption; Beauregard; Calcasieu; Cameron; East Baton Rouge; East Feliciana; Evangeline; Iberia; Iberville; Jefferson; Jefferson Davis; Lafayette; LaFourche; Livingston; Orleans; Plaquemines; Pointe Coupee; St Bernard; St Charles; St Helena; St James; St John the Baptist; St Landry; St Martin; St Mary; St Tammany; Tangipahoa; Terrebonne; Vermilion; Washington; West Baton Rouge; West Feliciana."],
## BBOX[28.85,-93.94,31.07,-88.75]],
## ID["EPSG",3452]]
# Extract coordinates
coords <- st_coordinates(parcels_centroids)
coords_df <- as.data.frame(coords)
##4.1 Overview of DBSCAN
DBSCAN identifies core, border, and noise points based on neighborhood density. It is particularly useful for spatial clustering because it does not require a predefined number of clusters and can effectively separate densely clustered areas from noise (Schubert et al., 2017).
##4.2 Methodological Steps
Data Preprocessing • Clean and normalize numerical attributes. • Generate parcel centroids using ArcPy FeatureToPoint. • Remove duplicates and null geometries. • Add and populate coordinate fields, exporting data to CSV and shapefiles.
Clustering Implementation • Determine optimal ε and minPts via domain knowledge and parameter tuning. • Analyze noise points for meaningful insights.
Evaluation and Validation • Visualize clusters using GIS tools. • Compare DBSCAN clusters with existing assessor boundaries. • Validate internally using metrics like silhouette scores and externally via assessor knowledge. • Assess cluster stability to ensure robustness.
# Free up memory before running DBSCAN
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 3835592 204.9 7350582 392.6 5363987 286.5
## Vcells 8241546 62.9 13967255 106.6 11320356 86.4
# Optimize DBSCAN Execution
set.seed(123)
dbscan_result <- dbscan(coords_df, eps = 500, minPts = 10)
# Assign cluster labels
parcels_centroids$cluster <- as.factor(dbscan_result$cluster)
# Remove noise points
valid_indices <- which(dbscan_result$cluster != 0)
valid_clusters <- dbscan_result$cluster[valid_indices]
valid_coords <- coords_df[valid_indices, ]
# Sample 5,000 points for silhouette computation (adjust as needed)
set.seed(123)
sample_size <- min(5000, nrow(valid_coords)) # Avoid excessive memory usage
sample_indices <- sample(1:nrow(valid_coords), sample_size)
# Subset data
sample_coords <- valid_coords[sample_indices, ]
sample_clusters <- valid_clusters[sample_indices]
# Compute silhouette score using sampled data
silhouette_scores <- silhouette(sample_clusters, dist(sample_coords))
mean_silhouette <- mean(silhouette_scores[, 3])
print(paste("Mean silhouette score (sampled data):", round(mean_silhouette, 3)))
## [1] "Mean silhouette score (sampled data): -0.165"
# # Generate Cluster Summary Statistics
# cluster_summary <- parcels_centroids %>%
# group_by(cluster) %>%
# summarise(
# avg_price = mean(SalePrice, na.rm = TRUE),
# count = n(),
# density = count / st_area(st_union(geometry))
# )
#
# print(cluster_summary)
plot_clusters <- ggplot(parcels_centroids) +
geom_sf(aes(color = cluster), size = 1, show.legend = FALSE) + # Hides legend
theme_minimal() +
labs(title = "DBSCAN Clustering of Parcel Centroids") + # Removes color legend
theme(legend.position = "none") # Explicitly removes legend
print(plot_clusters)
clusters_sf <- parcels_centroids %>%
filter(cluster != 0) %>%
group_by(cluster) %>%
summarize(geometry = st_union(geometry)) %>%
st_convex_hull()
ggplot() +
geom_sf(data = parcels_centroids, aes(color = cluster), size = 1, alpha = 0.7, show.legend = FALSE) +
geom_sf(data = clusters_sf, fill = NA, color = "black", size = 1) +
theme_minimal() +
labs(title = "DBSCAN Clusters with Convex Hulls") +
theme(legend.position = "none") # Remove legend
coords <- st_coordinates(parcels_centroids)
coords_df <- as.data.frame(coords)
ggplot(coords_df, aes(x = X, y = Y)) +
stat_density_2d(aes(fill = ..level..), geom = "polygon", contour = TRUE, show.legend = FALSE) +
scale_fill_viridis_c(guide = "none") + # Removes color scale
theme_minimal() +
labs(title = "Density Heatmap of Parcel Centroids", x = "Easting", y = "Northing")
## Warning: The dot-dot notation (`..level..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(level)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# 5.Results and Challenges
At this stage, the research is still in progress, and data exploration is ongoing. Preliminary clustering results are being evaluated, and additional variables may be introduced to refine the analysis. The main challenge thus far has been determining optimal parameter values for DBSCAN, particularly in balancing cluster formation and noise identification. Further work will focus on fine-tuning these parameters and incorporating additional property characteristics to enhance clustering precision.
This study explores the feasibility of using DBSCAN to delineate real estate market areas in St. Tammany Parish. Traditional methods often rely on subjective boundaries, whereas DBSCAN provides a data-driven alternative that can dynamically adapt to market trends. While initial results are still under evaluation, this study aims to establish a structured framework for integrating spatial clustering into assessor workflows. Future work will involve refining clustering parameters, incorporating more variables, and assessing the practical usability of DBSCAN-based market areas in real-world valuation processes.
Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN. ACM Trans. Database Syst. 42, 3, Article 19 (September 2017), 21 pages. https://doi.org/10.1145/3068335
Farhan, B. and Murray, A.T. (2005), A GIS-Based Approach for Delineating Market Areas for Park and Ride Facilities. Transactions in GIS, 9: 91-108. https://doi-org.ezaccess.libraries.psu.edu/10.1111/j.1467-9671.2005.00208.x
Han, S. S. (2005). Polycentric Urban Development and Spatial Clustering of Condominium Property Values: Singapore in the 1990s. Environment and Planning A: Economy and Space, 37(3), 463-481. https://doi-org.ezaccess.libraries.psu.edu/10.1068/a3746
Orr, A. M., Stewart, J. L., Jackson, C. C., & White, J. T. (2022). Shifting prime retailing pitches. A GIS analysis of the spatial adaptations in city centre retail markets. Journal of Property Research, 40(2), 101–133. https://doi.org/10.1080/09599916.2022.2141133
Yoshimura, Y., Santi, P., Arias, J. M., Zheng, S., & Ratti, C. (2021). Spatial clustering: Influence of urban street networks on retail sales volumes. Environment and Planning B: Urban Analytics and City Science, 48(7). https://doi.org/10.1177/2399808320954210