| Type | Tenantry | Size | Primary Trade Area |
|---|---|---|---|
| Convenience Center | Stores that sell convenience goods (e.g., groceries, pharmaceuticals); not anchored by a supermarket. | Less than 30,000 sq ft | Less than 5-minute driving time |
| Neighborhood Shopping Center | Stores that sell convenience goods and personal services (e.g., dry cleaning, shoe repair); often anchored by a supermarket. | 30,000 – 150,000 sq ft of gross leasable area; 4 – 10 acres | Less than 5-minute driving time; 1 – 1½-mile range; 5,000 – 40,000 potential customers |
| Community Shopping Center | Stores that sell convenience goods, personal services, and shoppers’ goods (e.g., apparel, appliances); may include a junior department store or off-price/discount store. | 100,000 – 300,000 sq ft of gross leasable area; 10 – 30 acres | 5 – 20-minute driving time; 3 – 6-mile range; 40,000 – 150,000 potential customers |
| Regional Shopping Center | Stores that sell general merchandise, shoppers’ goods, and convenience goods; typically includes one or more department stores. | 300,000 – 1,000,000 sq ft of gross leasable area; 30 – 100 acres | 20 – 40-minute driving time; 5 – 10-mile range; 150,000 – 400,000 potential customers |
| Super-Regional Shopping Center | Stores that sell general merchandise, apparel, furniture, home furnishings, services, and recreation; contains at least three major department stores. | Over 800,000 sq ft of gross leasable area | In excess of 30-minute driving time; typically 10 – 35-mile range; over 500,000 potential customers |
Retail Location Analysis
Introduction
This guide provides a comprehensive walk through of the Property Location Analysis Tool, an R-based system for evaluating real estate locations using geospatial data, demographics, and amenity analysis.
Location Adjustments in Retail Property Appraisal
In the valuation of retail properties, location remains one of the most influential determinants of value. Location adjustments account for the qualitative and quantitative differences in the site characteristics that influence a property’s utility, visibility, and access to a viable customer base. These adjustments are particularly critical when analyzing comparable sales in varying retail environments.
Retail success is closely tied to the property’s trade area or that geographic region from which a retail establishment draws the majority of its customers. The size and shape of a trade area are rarely uniform and are influenced by multiple factors including existing and proposed residential developments, natural or manmade barriers (e.g., rivers, highways), and psychological boundaries such as perceptions of safety. Transportation infrastructure and commuter patterns further shape trade area dynamics by affecting ease of access for potential shoppers.
Demographic variables within a trade area such as population density, average household income, age distribution, and consumer spending patterns directly impact retail viability. For instance, household income levels and consumer expenditure habits (e.g., on groceries, apparel, or electronics) can indicate the buying power and preferences of the local market. Retailers often rely on these data points to determine store format, product mix, and even pricing strategies.
Retail property buyers and appraisers must also evaluate the visibility and accessibility of a site. Elements such as traffic volume, street frontage, line-of-sight from major roads, and access to public transportation all contribute to a location’s competitive advantage. A site that is difficult to locate or reach whether due to excessive traffic speeds, poor signage, or other accessibility issues can significantly underperform in comparison to more accessible alternatives.
Ultimately, effective location adjustments require an understanding not only of physical proximity but of the socio-economic context within which a retail property operates. By carefully analyzing trade area boundaries, demographic trends, and site-specific factors, appraisers can make more precise adjustments, thereby producing credible and supportable value conclusions.
Table 1 below outlines some of the key characteristics of various shopping center types.
What This Tool Does
The Property Location Analysis Tool helps you determine a ranking of the location of comparables on a qualitative basis using a scoring of key elements:
- Provides an Input Section to set parameters for the property type, weightings, etc.
- Score locations based on multiple factors (income, population, amenities, property values, and traffic counts)
- Compare properties side-by-side to identify the best opportunities
- Visualize results with interactive maps, charts, and tables
- Generate reports in multiple formats (Excel, HTML, PNG)
- Caches data to avoid repeated API calls and speed up analysis
Key Features
- Geocoding addresses to coordinates (and vice versa)
- Fetching US Census demographic data by location
- Querying OpenStreetMap for nearby amenities
- Calculating weighted location scores
- Creating interactive visualizations
- Exporting comprehensive reports
Understanding the Code Structure
Architecture Overview
The tool is organized into several functional modules as shown in the following graphic. :
graph TD
A[Input: Property Details] --> B{Lat/Lon Available?}
B -- Yes --> C[Use Existing Coordinates]
B -- No --> D[Geocode Addresses]
C --> E[Data Collection]
D --> E
E --> F[Census Demographics]
E --> G[OSM Amenities]
F --> H[Scoring Engine]
G --> H
H --> I[Analysis Results]
I --> J[Visualizations]
I --> K[Reports]
J --> L[Maps, Charts, Tables]
K --> M[Excel, HTML]
Usage Instructions
Basic Usage
- Prepare an Excel file with subject and sales’ locations including these columns as a minimum:
Sale: Identifies the sale number or Subject PropertyAddress: Street address of the propertyCity: City property lies inState: State location as two-letter abbreviationZip Code: Zip Code for property locationPrice/SF: Price per square foot of propertyTraffic Count: For now you need to enter this figure from any available source- Optional:
latorlatitudecolumn. If you have it enter it to avoid geocoding - Optional:
lonorlongitudecolumn. If you have it enter it to avoid geocoding - Optional: other identifying columns
- Set parameters at the top of this document:
data_file: Path to your Excel fileshopping_center_type: Type of retail centerweights: Set Custom Weights (check params for right label)
- Render the document to generate analysis
Setup and Installation
Load Libraries
Install the appropriate libraries (install if necessary)
Show Code Block
library(sf) # Spatial data handling
library(tidyverse) # Data manipulation and visualization
library(httr) # HTTP requests
library(jsonlite) # JSON parsing
library(tigris) # Census geographic data
library(tidycensus) # Census demographic data
library(osmdata) # OpenStreetMap data
library(leaflet) # Interactive maps
library(plotly) # Interactive charts
library(knitr) # Document generation
library(kableExtra) # Table formatting
library(writexl) # Excel export
library(janitor) # Cleans and formats data frames
library(readxl) # Read Excel files
library(purrr) # Run multiple instances to increase speed
library(tibble) # Create, view and manipulate data frames
library(memoise) # Adds caching to avoid recomputing
library(dplyr) # Filter, arrange, summarize, and transform data
library(scales) # For transforming, formatting, and rescaling numeric data
library(ggdist) # For Monte Carlo visualizations
library(tmap) # For pdf maps
library(tidyr)
library(stringr)
library(ggplot2)
options(tigris_use_cache = TRUE)Memoised Results
This report uses two layers of caching to keep the analysis efficient and reproducible:
File-Based Cache
The first layer stores previously downloaded data—such as Census demographics or OpenStreetMap amenities—inside the
property_analysis_cache/folder on disk.These files are reused on future runs, so the script doesn’t need to redownload identical data.
Cached data automatically expires after a set number of days (default: 60 for demographics, 7 for amenities).
You can manually clear this cache by running
clear_cache()or deleting the folder.
In-Memory Memoisation
The second layer (handled by the
memoisepackage) remembers the results of function calls within the current R session.If a function like
get_census_demographics()is called again with the same coordinates, it retrieves the stored result from memory instantly rather than reading from disk.This makes iterative runs during development much faster, especially when re-rendering the document multiple times.
To ensure a clean start each time the Quarto document is rendered, the memoised (in-memory) layer is flushed at the beginning of each run. This reset does not delete the underlying .rds files in the file-based cache—it simply forgets any short-term memory from previous interactive sessions.
Parameter Validation
Ensures selected parameters are valid as to shopping center type
Show Code Block
valid_centers <- c(
"Convenience Store", "Neighborhood Shopping Center",
"Community Shopping Center", "Regional Shopping Center",
"Super-Regional Shopping Center"
)
if (!(params$shopping_center_type %in% valid_centers)) {
stop("Invalid shopping center type. Please use one of the following: ", paste(valid_centers, collapse = ", "))
}Census Key
Users will need to get a free Census API key at: https://api.census.gov/data/key_signup.html
Note: My census key has been saved as part of the .Renviron file so I don’t have to enter it each time and it is safe from prying eyes. Can be edited through the use of usethis::edit_r_environ(). To retrieve the key, put this is a code block: census_api_key <- Sys.getenv("CENSUS_API_KEY").Sets the Census API Key from the .Renviron file
Show Code Block
# Set your Census API key
census_api_key <- Sys.getenv("CENSUS_API_KEY")
if (census_api_key == "") stop("Missing Census API key. Set it in your .Renviron file.")Trade Area Radius Helper
Uses preset radii depending on the shopping center type selected in the parameters.
Show Code Block
# Define radius based on shopping center type
get_trade_area_radius <- function(center_type) {
switch(center_type,
"Convenience Store" = 1,
"Neighborhood Shopping Center" = 1.5,
"Community Shopping Center" = 6,
"Regional Shopping Center" = 10,
"Super-Regional Shopping Center" = 35,
1.5) # default fallback
}
trade_area_miles <- get_trade_area_radius(params$shopping_center_type)Functions
Caching Function
The tool includes a smart caching system to avoid redundant API calls and retrieves previously pulled data for use with timed parameters:
- First Query: Data is fetched from APIs and saved to
property_analysis_cache/ - Subsequent Queries: Data is loaded from cache (much faster!)
- Expiration: Demographics and amenities cache expires after 60 days
- Cache Key: Based on rounded coordinates (4 decimal places ≈ 11 meters)
- Purge Cache?: Call
clear_cache()in console
Show Code Block
# ============================================
# CACHING SYSTEM
# ============================================
cache_dir <- "property_analysis_cache"
if (!dir.exists(cache_dir)) {
dir.create(cache_dir)
}
save_to_cache <- function(data, cache_key, cache_type = "demographics") {
cache_file <- file.path(cache_dir, paste0(cache_type, "_", cache_key, ".rds"))
saveRDS(data, cache_file)
message(paste("Cached to:", cache_file))
}
load_from_cache <- function(cache_key, cache_type = "demographics", max_age_days = 60) {
cache_file <- file.path(cache_dir, paste0(cache_type, "_", cache_key, ".rds"))
if (file.exists(cache_file)) {
file_age <- difftime(Sys.time(), file.info(cache_file)$mtime, units = "days")
if (as.numeric(file_age) <= max_age_days) {
message(paste("Loading from cache:", cache_file))
return(readRDS(cache_file))
} else {
message(paste("Cache expired (", round(file_age, 1), "days old)"))
}
}
return(NULL)
}
clear_cache <- function(cache_type = NULL) {
if (is.null(cache_type)) {
files <- list.files(cache_dir, full.names = TRUE)
} else {
files <- list.files(cache_dir, pattern = paste0("^", cache_type), full.names = TRUE)
}
if (length(files) > 0) {
file.remove(files)
message(paste("Removed", length(files), "cached files"))
} else {
message("No cached files to remove")
}
}Geocoding Function
Geocodes properties in the base .xlsx file if file does not already have longitude and latitude coordinates.
Show Code Block
# ============================================
# GEOCODING FUNCTIONS AND EXECUTION
# ============================================
# --- 1. Helper functions -----------------------------------------------------
geocode_address <- function(address) {
base_url <- "https://geocoding.geo.census.gov/geocoder/locations/onelineaddress"
response <- httr::GET(
base_url,
query = list(
address = address,
benchmark = "Public_AR_Current",
format = "json"
)
)
data <- httr::content(response, as = "parsed")
if (length(data$result$addressMatches) > 0) {
coords <- data$result$addressMatches[[1]]$coordinates
return(c(lat = coords$y, lon = coords$x))
} else {
warning(paste("Address not found:", address))
return(c(lat = NA, lon = NA))
}
}
reverse_geocode <- function(lat, lon) {
base_url <- "https://geocoding.geo.census.gov/geocoder/geographies/coordinates"
response <- httr::GET(
base_url,
query = list(
x = lon,
y = lat,
benchmark = "Public_AR_Current",
vintage = "Current_Current",
format = "json"
)
)
data <- httr::content(response, as = "parsed")
return(data$result)
}
# --- 2. Read the Excel file dynamically -------------------------------------
input_data <- read_excel(params$data_file) %>%
janitor::clean_names()
# --- 3. Detect whether lat/lon already exist --------------------------------
# Normalize column names for flexibility
lat_col <- intersect(names(input_data), c("lat", "latitude"))
lon_col <- intersect(names(input_data), c("lon", "longitude"))
if (length(lat_col) == 1 && length(lon_col) == 1 &&
all(!is.na(input_data[[lat_col]])) &&
all(!is.na(input_data[[lon_col]]))) {
message("Latitude and longitude found in dataset — skipping geocoding.")
geocoded_results <- input_data %>%
rename(lat = all_of(lat_col), lon = all_of(lon_col))
} else {
message("Latitude and longitude not found — performing geocoding...")
# --- 4. Prepare address strings for geocoding -----------------------------
if (all(c("address", "city", "state") %in% names(input_data))) {
input_data <- input_data %>%
mutate(full_address = paste(address, city, state, sep = ", "))
} else if ("address" %in% names(input_data)) {
input_data <- input_data %>%
mutate(full_address = address)
} else {
stop("Excel file must include at least an 'address' column, or 'address', 'city', and 'state'.")
}
# --- 5. Apply the geocoder safely -----------------------------------------
geocode_safely <- purrr::safely(geocode_address)
geocoded_results <- input_data %>%
mutate(geo = map(full_address, geocode_safely)) %>%
mutate(
lat = map_dbl(geo, ~ .x$result["lat"] %||% NA_real_),
lon = map_dbl(geo, ~ .x$result["lon"] %||% NA_real_)
) %>%
select(-geo)
}Demographics Retrieval Function
Fetches demographic data from US Census API:
| Variable | Description | Census Code |
|---|---|---|
| Population | Total population in census tract | B01003_001 |
| Median Income | Median household income | B19013_001 |
| Median Home Value | Median value of owner-occupied homes | B25077_001 |
| Median Gross Rent | Median gross rent | B25046_001 |
Show Code Block
# ============================================
# DATA COLLECTION FUNCTIONS (Radius-Based, Safe Renaming, No Unemployment)
# ============================================
get_census_demographics <- function(lat, lon,
census_api_key = NULL,
radius_miles = trade_area_miles,
use_cache = TRUE,
cache_max_age = 30) {
cache_key <- paste0(round(lat, 4), "_", round(lon, 4), "_r", radius_miles)
if (use_cache) {
cached_data <- load_from_cache(cache_key, "demographics", cache_max_age)
if (!is.null(cached_data)) {
message("Loaded demographics from cache for ", cache_key)
return(cached_data)
}
}
message("Cache not found or expired — fetching new Census data...")
if (!is.null(census_api_key)) {
tidycensus::census_api_key(census_api_key, install = FALSE)
}
# Build buffer geometry
point <- sf::st_sfc(sf::st_point(c(lon, lat)), crs = 4326)
buffer_m <- radius_miles * 1609.34
buffer_area <- sf::st_transform(point, 3857) |>
sf::st_buffer(buffer_m) |>
sf::st_transform(4326)
# ACS variables of interest
vars <- c(
population = "B01003_001",
median_income = "B19013_001",
median_home_value = "B25077_001",
median_gross_rent = "B25046_001"
)
tracts_data <- tigris::tracts(cb = TRUE, year = 2023, progress_bar = FALSE) |>
sf::st_transform(4326)
tracts_in_area <- suppressWarnings(sf::st_intersection(tracts_data, buffer_area))
if (nrow(tracts_in_area) == 0) {
warning("No census tracts found within trade area.")
return(tibble())
}
tract_fips <- unique(paste0(tracts_in_area$STATEFP, tracts_in_area$COUNTYFP))
message("Downloading ACS data...")
demo_data_sf <- purrr::map_dfr(
tract_fips,
function(fips) {
tryCatch({
tidycensus::get_acs(
geography = "tract",
variables = vars,
state = substr(fips, 1, 2),
county = substr(fips, 3, 5),
year = 2022,
survey = "acs5",
geometry = TRUE
)
}, error = function(e) NULL)
}
) |> sf::st_transform(4326)
if (nrow(demo_data_sf) == 0) {
warning("No ACS data returned for this area.")
return(tibble())
}
demo_in_area <- suppressWarnings(sf::st_intersection(demo_data_sf, buffer_area))
if (nrow(demo_in_area) == 0) {
warning("No demographic data found within buffer.")
return(tibble())
}
# Aggregate results
demo_summary <- demo_in_area |>
dplyr::group_by(variable) |>
dplyr::summarise(estimate = mean(estimate, na.rm = TRUE), .groups = "drop") |>
tidyr::pivot_wider(names_from = variable, values_from = estimate)
# --- Safe renaming: only rename if columns exist ---------------------------
rename_map <- c(
B01003_001 = "population",
B19013_001 = "median_income",
B25077_001 = "median_home_value",
B25046_001 = "median_gross_rent"
)
rename_map <- rename_map[names(rename_map) %in% names(demo_summary)]
demo_summary <- dplyr::rename(demo_summary, !!!rename_map)
demo_summary <- demo_summary |>
dplyr::mutate(radius_miles = radius_miles)
if (use_cache) save_to_cache(demo_summary, cache_key, "demographics")
return(demo_summary)
}
# Memoised version for fast in-session reuse
get_census_demographics_memo <- memoise(get_census_demographics)Amenity Function
Queries OpenStreetMap (https://wiki.openstreetmap.org/wiki/Key:amenity) for nearby amenities within specified radius (1.5 miles for this analysis). OSM’s amenity key is extremely broad. It includes everything from airports to post boxes.
In the context of retail demand and site selection, the relevant amenities are those that:
Attract regular consumer visits (e.g., restaurants, cafes, banks, pharmacies).
Represent daily or weekly necessities (e.g., supermarkets, convenience stores, schools).
Encourage dwell time or foot traffic clustering (e.g., parks, places of worship, leisure).
Support workforce presence (e.g., offices, post offices, government buildings).
These amenities serve as proxies for consumer presence, disposable income circulation, and accessibility to routine goods and services. For a retail valuation context, they also mirror the “market support” and “trade area vitality” concepts used in market analysis texts like Fanning, Market Analysis for Real Estate, 2nd ed.
The general amenity categories that are used in this analysis include the following:
- Supermarkets - As a proxy for measuring the presence of food retail and daily household shopping activity.
- Restaurants - Correlates to consumer engagement, social clustering and evening/weekend traffic.
- Cafes - Also correlates to consumer engagement, social clustering and evening/weekend traffic.
- Banks - Usually daytime traffic and draw to other nearby retail establishments.
- Hospitals - Represents essential service and consistent daytime population.
- Pharmacies - Regular essential service that often reflects high traffic.
- School/College/University - Brings regular population flows including students, parents, and staff.
- Parks - Encourages recreational foot traffic and neighborhood appeal.
Show Code Block
# ============================================
# IMPROVED AMENITY COLLECTION FUNCTION
# ============================================
find_nearby_amenities <- function(lat, lon,
amenity_types = params$amenity_types,
radius_miles = trade_area_miles,
use_cache = TRUE,
cache_max_age = 7) {
# Fallback list
if (is.null(amenity_types) || length(amenity_types) == 0) {
amenity_types <- c("school", "college", "university", "restaurant", "cafe",
"bank", "hospital", "pharmacy", "supermarket", "park")
}
cache_key <- paste0(round(lat, 4), "_", round(lon, 4), "_r", radius_miles)
if (use_cache) {
cached_data <- load_from_cache(cache_key, "amenities", cache_max_age)
if (!is.null(cached_data)) return(cached_data)
}
radius_m <- radius_miles * 1609.34
bbox_buffer <- radius_m / 111320 # convert meters to degrees (approx)
bbox <- c(lon - bbox_buffer, lat - bbox_buffer, lon + bbox_buffer, lat + bbox_buffer)
results <- tibble(amenity = character(), count = numeric())
target <- sf::st_sfc(sf::st_point(c(lon, lat)), crs = 4326) |> sf::st_transform(3857)
for (a in amenity_types) {
tryCatch({
# Choose appropriate key depending on amenity
key <- dplyr::case_when(
a == "supermarket" ~ "shop",
a == "park" ~ "leisure",
TRUE ~ "amenity"
)
query <- osmdata::opq(bbox = bbox) %>%
osmdata::add_osm_feature(key = key, value = a)
osm_data <- osmdata::osmdata_sf(query)
count <- 0
# Combine points and polygons
all_geoms <- list(osm_data$osm_points, osm_data$osm_polygons)
all_geoms <- all_geoms[!vapply(all_geoms, is.null, logical(1))]
if (length(all_geoms) > 0) {
all_sf <- dplyr::bind_rows(lapply(all_geoms, function(g) {
sf::st_centroid(sf::st_transform(g, 3857))
}))
dists <- sf::st_distance(all_sf, target)
count <- sum(as.numeric(dists) <= radius_m)
}
results <- dplyr::add_row(results, amenity = a, count = count)
Sys.sleep(1)
}, error = function(e) {
warning(paste("Error fetching", a, ":", e$message))
results <<- dplyr::add_row(results, amenity = a, count = 0)
})
}
if (use_cache) save_to_cache(results, cache_key, "amenities")
return(results)
}
# Memoised version
find_nearby_amenities_memo <- memoise::memoise(find_nearby_amenities)| Sale | Banks | Cafes | Colleges | Hospitals | Parks | Pharmacies | Restaurants | Schools | Supermarkets | Universities | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 53 | 2 | 0 | 0 | 427 | 3 | 66 | 48 | 52 | 0 | 651 |
| 2 | 26 | 9 | 0 | 0 | 29 | 0 | 18 | 16 | 20 | 0 | 118 |
| 3 | 84 | 1 | 0 | 20 | 63 | 2 | 193 | 37 | 46 | 1 | 447 |
| 4 | 3 | 15 | 5 | 14 | 283 | 4 | 233 | 36 | 33 | 0 | 626 |
| 5 | 28 | 1 | 0 | 0 | 146 | 1 | 50 | 190 | 10 | 0 | 426 |
| 6 | 114 | 2 | 0 | 6 | 290 | 9 | 71 | 88 | 50 | 0 | 630 |
| 7 | 30 | 0 | 0 | 0 | 9 | 0 | 37 | 10 | 0 | 0 | 86 |
| 8 | 61 | 4 | 0 | 15 | 216 | 2 | 107 | 34 | 90 | 0 | 529 |
| Subject | 0 | 0 | 0 | 0 | 467 | 18 | 6 | 66 | 19 | 45 | 621 |
Traffic Count Information
A user would need to add the traffic counts to their base Excel file.
Traffic count data measure the average number of vehicles passing specific roadway points, often expressed as Annual Average Daily Traffic (AADT). While this report uses data from the Texas Department of Transportation (TxDOT) Open Data Portal, similar information is available from most U.S. states and many other countries.
United States
- State Departments of Transportation (DOTs):
Most state DOTs publish AADT datasets through their GIS or open data portals.
Examples include:- Caltrans (California) → Traffic Volumes
- Florida Department of Transportation (FDOT) → Traffic Counts
- North Carolina DOT (NCDOT) → Traffic Survey Data
- Caltrans (California) → Traffic Volumes
- Federal Highway Administration (FHWA):
The FHWA provides nationwide roadway and traffic data through the
Highway Performance Monitoring System (HPMS).
Canada
- Provincial Ministries of Transportation:
Each province maintains similar datasets:- Ontario Open Data Catalogue → Annual Average Daily Traffic
- DataBC Catalogue (British Columbia) → Traffic Counts
- GeoDiscover Alberta → Transportation / Traffic Volume Sites
- Ontario Open Data Catalogue → Annual Average Daily Traffic
- Municipal and Regional Sources:
Large municipalities (e.g., Toronto, Vancouver, Calgary) often publish local traffic counts through their open data portals.
International
- United Kingdom:
Department for Transport Open Data — includes continuous and manual count sites across England, Scotland, and Wales.
- European Union:
Many member nations release traffic and transport statistics under
the INSPIRE Directive or through Eurostat Transport Statistics.
- Global Alternatives:
In regions without government data, commercial and open platforms such as
TomTom Traffic,
HERE Traffic, or
OpenStreetMap Traffic Layers
can provide approximate roadway congestion or flow estimates.
Future Implementation Notes
To integrate local traffic data into this analysis: 1. Download the shapefile or GeoJSON version of your region’s AADT or traffic count dataset.
2. Load it into R using sf::st_read("path/to/file.shp").
3. Perform a spatial join between your property points and the nearest road segment (sf::st_join() or st_nearest_feature()).
4. Extract or average the relevant traffic count field for scoring.
Weighting Function/Inputs
Required inputs to apply appropriate weight to the income (consumer spending power), the population density (a reflection of foot traffic potential), amenities (neighborhood quality and drawing power), property values (a function of costs to live in area), and traffic counts (a measure of visibility, accessibility and potential customers beyond population in trade area).
Scoring Function
Calculates a comprehensive location score (0-100) based on five factors:
- Formula:
rescale(median_income, to = c(0, 100), from = range(all_incomes)) - Higher income = better location (greater purchasing power)
- Dyanmically normalized to 0-100 scale
Interpretation: The Income Score measures the relative purchasing power of households within each trade area. Median household income remains a reliable proxy for local spending capacity and the ability of residents to support a broad range of retail goods and services. Higher median incomes typically correspond with greater discretionary spending, stronger support for mid- and upper-tier retailers, and enhanced resilience during economic downturns.
This uses adaptive scaling to account for any range of income levels. The score is calculated by rescaling each trade area’s median income within the range of all incomes observed across the study sample. This approach preserves meaningful differences among higher-income areas, where a capped formula would otherwise treat them as identical, and allows for balanced comparison between affluent and moderate-income markets.
Notes on the Scale: The scaling is anchored to the lowest and highest median incomes within the comparison group. The location with the highest income receives a score of 100, while lowest income area scores a 0. Other income levels are scored proportionally.
- Formula:
rescale(population, to = c(0, 100), from = range(all_populations)) - Higher population = more potential customers/activity
- Normalized to 0-100 scale
Interpretation: The Population Density Score represents the concentration of potential customers within the trade area. Retail activity depends not only on affluence but also on proximity and repetition of visits. Areas with greater population density tend to generate stronger pass-by traffic, higher visibility, and more consistent sales volumes.
This metric applies adaptive scaling rather than a fixed divisor. Each trade area’s population is rescaled within the observed range of populations across the study sample. This ensures that every location receives a proportionate score, maintaining meaningful separation even among densely populated areas where a capped formula might otherwise flatten results.
Notes on the Scale: The lowest population in the comparison group anchors the bottom of the scale at 0, while the highest defines 100. All other sites are positioned proportionally between them.
- Formula:
0.4 × (Diversity %) + 0.6 × (Density %, adaptively scaled 0–100) - More diverse and concentrated amenities = stronger location
- Balances variety and overall intensity of surrounding uses
Interpretation: The Amenities Score reflects both the diversity and density of nearby amenities that contribute to a site’s retail vitality. Diversity measures the proportion of tracked amenity categories—such as restaurants, schools, parks, banks, and pharmacies—that are represented within the trade area. Density measures the total number of amenities, scaled relative to the range observed across all study locations.
Together, these components create a balanced indicator of neighborhood convenience, consumer draw, and the underlying strength of the retail ecosystem. Locations that contain all tracked amenity types receive full credit for diversity, while those with fewer categories present earn proportionally less. The density component differentiates sites based on how intensively built and commercially active they are within the same amenity mix.
Notes on the Scale: The diversity index reaches 100 when all tracked amenity categories are represented within the trade area. The density index is adaptively normalized between the lowest and highest total amenity counts observed among all properties being compared, ensuring that each site’s score reflects its position within the actual range of market activity. The final score combines both effects, assigning 40% weight to diversity and 60% to density. This method ensures fair comparison across urban, suburban, and rural contexts without relying on arbitrary saturation thresholds.
- Formula:
0.50 x min(median_home_value / 5000, 100) + 0.5 x min(median_gross_rent / 30, 100) - Higher values = more desirable neighborhood
- Normalized to 0-100 scale
Interpretation: The Property Value Score represents a composite measure of market vitality that combines indicators of long-term stability and short-term spending capacity within a trade area. Median home value reflects neighborhood affluence, ownership stability, and the general desirability of the surrounding residential market. Median gross rent complements this by capturing the relative cost of occupancy and the local balance between household income and housing expense—an important proxy for retail purchasing power and economic elasticity.
By weighting home value and rent equally (50/50), the score avoids bias toward either high-income owner-occupied areas or transient high-rent districts. A higher Property Value Score therefore signals a trade area that is both financially resilient and economically active, typically corresponding to locations that can support sustainable retail rents and consistent consumer demand.
High home value and high rent → strong, affluent market with high retail viability.
High rent but low home value → possibly transient or overburdened market.
High home value but low rent → stable ownership but low circulation spending.
Notes on the Scale” The divisor for home value (/5000) is a normalization factor that caps high-value markets near 100. You can tune it—if your markets are typically high-end, you might raise that to /6000 or /8000. The divisor for gross rent (/30) is similar: $30 × 100 = $3,000, roughly the upper bound for typical monthly rent, producing comparable scaling.
- Formula:
rescale(traffic_count, to = c(0, 100), from = range(all_traffic_counts)) - Higher traffic = greater retail visibility and consumer flow
- Dynamically normalized to 0–100 scale
Interpretation: The Traffic Count Score measures the retail exposure and accessibility of a location based on roadway volume. Higher traffic counts typically translate to greater visibility, impulse visits, and daily customer flow.
This score is dynamically scaled relative to the comparison group, ensuring that properties with exceptionally high or low counts remain distinguishable within each analysis set.
Notes on the Scale: The lowest observed traffic count anchors the scale at 0, and the highest at 100. When comparing sites across multiple markets, a shared range can be applied to maintain consistent interpretation across studies.
Total Score Calculation \[ \text{Total Score} = \sum_{i=1}^{5} (\text{Component Score}_i \times \text{Weight}_i) \]
Interpretation: The Total Score represents the weighted sum of five core indicators—Income, Population, Amenities, Property Value, and Traffic Count—each scaled from 0 to 100. Weights (𝑤ᵢ) control the relative importance of each factor and are defined in the document parameters. This additive framework allows the model to reflect both market potential (income, population), site quality (amenities, property value), and exposure/accessibility (traffic count).
Notes on the Scale: Each component score is dynamically normalized within its observed range to preserve proportional differences among locations. The composite score therefore expresses each site’s overall retail viability relative to all others in the analysis.
Analysis Function
Main function that ties everything together:
- Geocodes address (if not provided in Excel file)
- Fetches demographics
- Queries amenities
- Calculates scores
- Returns comprehensive results object
Show Code Block
# ============================================
# BATCH COMPARISON — FULLY NORMALIZED
# ============================================
compare_locations_from_excel <- function(
data_file = params$data_file,
census_api_key = Sys.getenv("CENSUS_API_KEY"),
center_type = params$shopping_center_type,
radius_miles = trade_area_miles
) {
# Always pull weights from YAML
weights <- unlist(params$weights$value)
message("-----------------------------------------------------")
message("Running Batch Location Comparison")
message("Center Type: ", center_type)
message("Trade Area Radius: ", radius_miles, " miles")
message("Source File: ", data_file)
message("-----------------------------------------------------")
# --- Load data ---
df <- readxl::read_excel(data_file) |>
janitor::clean_names()
if (!"traffic_count" %in% names(df)) {
df$traffic_count <- NA_real_
}
lat_col <- intersect(names(df), c("lat", "latitude"))
lon_col <- intersect(names(df), c("lon", "longitude"))
if (length(lat_col) != 1 || length(lon_col) != 1) {
stop("Excel file must include latitude and longitude columns.")
}
df <- df |>
dplyr::rename(lat = all_of(lat_col), lon = all_of(lon_col)) |>
dplyr::filter(!is.na(lat) & !is.na(lon))
# --- Compute traffic range globally ---
traffic_range <- range(df$traffic_count, na.rm = TRUE)
# --- Gather demographic + amenity data for global scaling ---
demo_list <- list()
amenity_list <- list()
for (i in seq_len(nrow(df))) {
lat <- df$lat[i]
lon <- df$lon[i]
demo <- tryCatch(
get_census_demographics(lat, lon, census_api_key, radius_miles = radius_miles),
error = function(e) tibble()
)
if (nrow(demo) > 0) demo_list[[i]] <- demo
amen <- tryCatch(
find_nearby_amenities(lat, lon, radius_miles = radius_miles),
error = function(e) tibble()
)
if (nrow(amen) > 0) amenity_list[[i]] <- amen
}
all_demo <- bind_rows(demo_list)
income_range <- range(all_demo$median_income, na.rm = TRUE)
population_range <- range(all_demo$population, na.rm = TRUE)
home_value_range <- range(all_demo$median_home_value, na.rm = TRUE)
rent_range <- range(all_demo$median_gross_rent, na.rm = TRUE)
amenity_summary <- purrr::map_dfr(amenity_list, function(a) {
tibble(diversity = length(unique(a$amenity)),
density = sum(a$count, na.rm = TRUE))
})
div_range <- range(amenity_summary$diversity, na.rm = TRUE)
den_range <- range(amenity_summary$density, na.rm = TRUE)
# --- Run analysis for each property ---
results <- purrr::pmap_dfr(
df,
function(...) {
row <- tibble::tibble(...)
message("Analyzing location: ", row$lat, ", ", row$lon)
analysis <- calculate_location_score(
lat = row$lat,
lon = row$lon,
traffic_count = row$traffic_count,
census_api_key = census_api_key,
radius_miles = radius_miles,
income_range = income_range,
population_range = population_range,
traffic_range = traffic_range,
home_value_range = home_value_range,
rent_range = rent_range,
amenity_div_range = div_range,
amenity_den_range = den_range
)
tibble::tibble(
property_id = if ("id" %in% names(row)) row$id else NA_character_,
address = if ("address" %in% names(row)) row$address else paste0("(", round(row$lat, 4), ", ", round(row$lon, 4), ")"),
lat = row$lat,
lon = row$lon,
total_score = analysis$total_score,
income_score = analysis$individual_scores$income,
population_score = analysis$individual_scores$population_density,
amenities_score = analysis$individual_scores$amenities,
property_value_score = analysis$individual_scores$property_value,
traffic_score = analysis$individual_scores$traffic_count
)
}
)
# --- Rank and export ---
ranked_results <- results |>
dplyr::arrange(desc(total_score)) |>
dplyr::mutate(rank = dplyr::row_number())
output_file <- paste0("Location_Scoring_", gsub(" ", "_", center_type), "_", radius_miles, "mi.xlsx")
writexl::write_xlsx(ranked_results, output_file)
message("Results saved to: ", output_file)
message("-----------------------------------------------------")
return(ranked_results)
}| Rank | Sale | Total Score | Income Score | Population Score | Amenities Score | Property Value Score | Traffic Score |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 74.3 | 25.8 | 100.0 | 100.0 | 65.2 | 100.0 |
| 2 | Subject | 67.5 | 100.0 | 61.5 | 96.8 | 73.6 | 10.0 |
| 3 | 7 | 54.5 | 56.3 | 57.2 | 40.0 | 91.1 | 36.7 |
| 4 | 2 | 46.0 | 25.8 | 99.8 | 43.4 | 39.3 | 0.0 |
| 5 | 6 | 42.0 | 34.6 | 43.6 | 97.8 | 56.2 | 15.5 |
| 6 | 8 | 41.3 | 27.2 | 51.6 | 87.0 | 43.1 | 23.2 |
| 7 | 4 | 32.0 | 35.5 | 0.0 | 97.3 | 59.6 | 28.5 |
| 8 | 5 | 28.8 | 0.0 | 9.0 | 76.1 | 14.7 | 85.1 |
| 9 | 3 | 21.6 | 12.6 | 19.8 | 78.3 | 13.5 | 13.5 |
Create Combined Results Export
This will create a new Excel file with all collected inforamtion.
Show Code Block
# =====================================================
# EXPORT COMPREHENSIVE RESULTS (Failsafe Version)
# =====================================================
safely_get_col <- function(df, patterns) {
cols <- names(df)
match <- cols[grepl(paste(patterns, collapse = "|"), cols, ignore.case = TRUE)]
if (length(match) > 0) return(match[1])
return(NULL)
}
# --- 1. Read and clean the base Excel file ---
original_data <- readxl::read_excel(params$data_file) %>%
janitor::clean_names()
# --- 2. Detect or create address column ---
address_col <- safely_get_col(original_data, c("^address$", "addr", "site", "property"))
if (!is.null(address_col)) {
original_data <- original_data %>%
rename(address = !!sym(address_col))
message("Detected address column: ", address_col)
} else {
message("⚠️ No address column detected. Creating synthetic labels.")
original_data <- original_data %>%
mutate(address = paste0("Site_", row_number()))
}
# --- 3. Ensure lat/lon columns exist ---
lat_col <- safely_get_col(original_data, c("^lat$", "latitude"))
lon_col <- safely_get_col(original_data, c("^lon$", "longitude"))
if (!is.null(lat_col)) original_data <- rename(original_data, lat = !!sym(lat_col)) else original_data$lat <- NA_real_
if (!is.null(lon_col)) original_data <- rename(original_data, lon = !!sym(lon_col)) else original_data$lon <- NA_real_
# --- 4. Add geocoded coordinates if missing ---
if (exists("geocoded_results")) {
original_data <- original_data %>%
mutate(addr_key = str_trim(tolower(address))) %>%
left_join(
geocoded_results %>%
mutate(addr_key = str_trim(tolower(address))) %>%
select(addr_key, lat_geo = lat, lon_geo = lon),
by = "addr_key"
) %>%
mutate(
lat = coalesce(lat, lat_geo),
lon = coalesce(lon, lon_geo)
) %>%
select(-addr_key, -lat_geo, -lon_geo)
}
# --- 5. Build amenity summary safely ---
if (exists("amenity_results")) {
amenity_summary_join <- amenity_results %>%
mutate(addr_key = str_trim(tolower(address))) %>%
group_by(addr_key, amenity) %>%
summarise(total = sum(count, na.rm = TRUE), .groups = "drop") %>%
tidyr::pivot_wider(
names_from = amenity,
values_from = total,
values_fill = list(total = 0)
)
} else {
amenity_summary_join <- tibble(addr_key = character())
}
# --- 6. Gather raw demographics ---
demo_list <- purrr::map2_dfr(
original_data$lat, original_data$lon,
function(lat, lon) {
if (is.na(lat) || is.na(lon)) return(tibble())
d <- get_census_demographics(lat, lon, census_api_key = Sys.getenv("CENSUS_API_KEY"))
if (nrow(d) == 0) return(tibble())
d$lat <- lat; d$lon <- lon
d
}
)
demo_join <- demo_list %>%
mutate(lat_r = round(lat, 4), lon_r = round(lon, 4)) %>%
select(lat_r, lon_r, population, median_income, median_home_value, median_gross_rent)
# --- 7. Merge everything together ---
final_export <- location_scores %>%
mutate(addr_key = str_trim(tolower(address))) %>%
left_join(amenity_summary_join, by = "addr_key") %>%
mutate(lat_r = round(lat, 4), lon_r = round(lon, 4)) %>%
left_join(demo_join, by = c("lat_r", "lon_r")) %>%
left_join(
original_data %>%
mutate(addr_key = str_trim(tolower(address))) %>%
select(-lat, -lon),
by = "addr_key"
)
# --- 8. Ensure essential columns exist ---
for (col in c("address", "lat", "lon", "population", "median_income",
"median_home_value", "median_gross_rent")) {
if (!col %in% names(final_export)) final_export[[col]] <- NA
}
# --- 9. Reorder logical columns ---
final_export <- final_export %>%
select(
address,
lat, lon,
population, median_income, median_home_value, median_gross_rent,
total_score, income_score, population_score,
amenities_score, property_value_score, traffic_score,
everything()
)
# --- 10. Write to Excel ---
output_file <- paste0(
"Comprehensive_Location_Results_",
gsub(" ", "_", params$shopping_center_type),
"_", trade_area_miles, "mi.xlsx"
)
writexl::write_xlsx(final_export, output_file)Sensivity Analysis
The sensitivity analysis tests how stable each property’s ranking remains when the weighting of input factors such as income, population density, amenities, property values, and traffic counts is allowed to vary randomly within reasonable limits. This scenario analysis runs 5,000 times with alternative weighting configuration being simulated. Each configuration produces a new set of location rankings. The resulting distribution of simulated ranks for each property reflects how dependent its position is on the specific weighting assumptions used in the model.
The median simulated rank represents the property’s typical performance across all simulations, while the 5th and 95th percentiles mark its best-case and worst-case outcomes, respectively. A property with a low median rank and a narrow percentile range demonstrates consistent strength and resilience to changing assumptions. In contrast, a wide percentile range indicates that the property’s relative standing shifts considerably under different weighting scenarios, suggesting greater sensitivity to how the model prioritizes its components. Together, these measures reveal not only which locations perform best on average but also which are most robust to uncertainty in the underlying valuation logic.
The “reasonable limits” for weighting variation are defined by a distribution, which produces random, positive weights that sum to one. This distribution allows each factor including Income, Population, Amenities, Property Value, and Traffic to fluctuate broadly between near-zero and moderate dominance, representing plausible shifts in judgment without introducing unrealistic extremes.Each factor can vary between very small (near 0) and fairly large (up to ~0.7–0.8), but the average weight remains around 0.20 (1/5). The Gamma(1,1) shape parameter ensures broad variability without producing absurd or hyper-concentrated weights (like one factor getting 99% of total influence).
Each Monte Carlo run represents one hypothetical weighting configuration within those bounds. Monte Carlo error decreases approximately in proportion to the inverse of the square root of the number of simulations:
\[ \text{Error} \propto \frac{1}{\sqrt{N}} \]
This means that as the number of scenarios (N) increases, the uncertainty in the simulated results shrinks rapidly at first and then levels off:
- 100 runs → approximately 10 % uncertainty
- 1,000 runs → approximately 3 % uncertainty
- 5,000 runs → approximately 1.4 % uncertainty
In practice, 5,000 simulations represent a balanced choice—large enough to yield stable percentile and rank estimates, yet small enough to run efficiently on a standard laptop without noticeable slowdown.
| Sale | Median Rank | 5th Percentile | 95th Percentile |
|---|---|---|---|
| 1 | 1.0 | 1.0 | 1.5 |
| Subject | 2.0 | 1.4 | 4.0 |
| 7 | 3.0 | 2.5 | 5.1 |
| 8 | 5.0 | 5.0 | 6.5 |
| 6 | 5.5 | 4.0 | 6.5 |
| 2 | 6.0 | 2.5 | 9.0 |
| 4 | 7.0 | 3.5 | 7.5 |
| 5 | 8.0 | 2.5 | 9.0 |
| 3 | 9.0 | 7.4 | 9.0 |
Ranked Location Map
Figure 2: Ranked Location Map by Total Score.
Radar Chart
Weighted Sensivity Heatmap
Amenity Density Map
Still needs work
Final Summary Table
Comparative analysis is defined as “the process by which a value indication is derived in the sales comparison approach. Comparative analysis may employ quantitative or qualitative techniques, either separately or in combination.”1 The Appraisal of Real Estate outlines examples of the techniques used in quantitative adjustments and qualitative analyses as shown in Table 6.2
| Quantitative Analysis | Qualitative Analysis |
|---|---|
| Paired data analysis (sales and resales of the same or similar properties) | Relative comparison analysis |
| Grouped data analysis | Ranking analysis |
| Secondary data analysis | Personal interviews |
| Statistical analysis including graphic analysis and scenario analysis | |
| Cost-related adjustments (cost to cure, depreciated cost) | |
| Capitalization of income differences | |
| Trend analysis |
When adequate data is available, quantitative analysis can be a helpful tool, but often the differences between properties makes it challenging to determine appropriate adjustments with a level of certainty. In these cases qualitative analysis or ranking can be helpful.”3 Qualitative analysis recognizes the inefficiencies of real estate markets and the difficulty of expressing adjustments with mathematical precision.”
The analysis to this point has described in detail how the properties analysed for location attributes contribute to retail property success. As shown in Table 7, the subject property is near the top of the ranking analysis suggesting a value at or below Sale 1 and at or above the remaining sales.
| Rank | Sale | Total Score | Price/SF |
|---|---|---|---|
| 1 | 1 | 74.3 | $451 |
| 2 | Subject | 67.5 | |
| 3 | 7 | 54.5 | $308 |
| 4 | 2 | 46.0 | $428 |
| 5 | 6 | 42.0 | $392 |
| 6 | 8 | 41.3 | $281 |
| 7 | 4 | 32.0 | $391 |
| 8 | 5 | 28.8 | $240 |
| 9 | 3 | 21.6 | $115 |