Retail Location Analysis

Author

Jim Amorin, CAE, MAI, SRA, AI-GRS, CDEI

Published

October 24, 2025

Introduction

This guide provides a comprehensive walk through of the Property Location Analysis Tool, an R-based system for evaluating real estate locations using geospatial data, demographics, and amenity analysis.

Location Adjustments in Retail Property Appraisal

In the valuation of retail properties, location remains one of the most influential determinants of value. Location adjustments account for the qualitative and quantitative differences in the site characteristics that influence a property’s utility, visibility, and access to a viable customer base. These adjustments are particularly critical when analyzing comparable sales in varying retail environments.

Retail success is closely tied to the property’s trade area or that geographic region from which a retail establishment draws the majority of its customers. The size and shape of a trade area are rarely uniform and are influenced by multiple factors including existing and proposed residential developments, natural or manmade barriers (e.g., rivers, highways), and psychological boundaries such as perceptions of safety. Transportation infrastructure and commuter patterns further shape trade area dynamics by affecting ease of access for potential shoppers.

Demographic variables within a trade area such as population density, average household income, age distribution, and consumer spending patterns directly impact retail viability. For instance, household income levels and consumer expenditure habits (e.g., on groceries, apparel, or electronics) can indicate the buying power and preferences of the local market. Retailers often rely on these data points to determine store format, product mix, and even pricing strategies.

Retail property buyers and appraisers must also evaluate the visibility and accessibility of a site. Elements such as traffic volume, street frontage, line-of-sight from major roads, and access to public transportation all contribute to a location’s competitive advantage. A site that is difficult to locate or reach whether due to excessive traffic speeds, poor signage, or other accessibility issues can significantly underperform in comparison to more accessible alternatives.

Ultimately, effective location adjustments require an understanding not only of physical proximity but of the socio-economic context within which a retail property operates. By carefully analyzing trade area boundaries, demographic trends, and site-specific factors, appraisers can make more precise adjustments, thereby producing credible and supportable value conclusions.

Table 1 below outlines some of the key characteristics of various shopping center types.

Table 1: Type of Shopping Centers
Type Tenantry Size Primary Trade Area
Convenience Center Stores that sell convenience goods (e.g., groceries, pharmaceuticals); not anchored by a supermarket. Less than 30,000 sq ft Less than 5-minute driving time
Neighborhood Shopping Center Stores that sell convenience goods and personal services (e.g., dry cleaning, shoe repair); often anchored by a supermarket. 30,000 – 150,000 sq ft of gross leasable area; 4 – 10 acres Less than 5-minute driving time; 1 – 1½-mile range; 5,000 – 40,000 potential customers
Community Shopping Center Stores that sell convenience goods, personal services, and shoppers’ goods (e.g., apparel, appliances); may include a junior department store or off-price/discount store. 100,000 – 300,000 sq ft of gross leasable area; 10 – 30 acres 5 – 20-minute driving time; 3 – 6-mile range; 40,000 – 150,000 potential customers
Regional Shopping Center Stores that sell general merchandise, shoppers’ goods, and convenience goods; typically includes one or more department stores. 300,000 – 1,000,000 sq ft of gross leasable area; 30 – 100 acres 20 – 40-minute driving time; 5 – 10-mile range; 150,000 – 400,000 potential customers
Super-Regional Shopping Center Stores that sell general merchandise, apparel, furniture, home furnishings, services, and recreation; contains at least three major department stores. Over 800,000 sq ft of gross leasable area In excess of 30-minute driving time; typically 10 – 35-mile range; over 500,000 potential customers
Source: Stephen F. Fanning, Market Analysis for Real Estate, 2nd ed. (Chicago: Appraisal Institute, 2014)

What This Tool Does

The Property Location Analysis Tool helps you determine a ranking of the location of comparables on a qualitative basis using a scoring of key elements:

  • Provides an Input Section to set parameters for the property type, weightings, etc.
  • Score locations based on multiple factors (income, population, amenities, property values, and traffic counts)
  • Compare properties side-by-side to identify the best opportunities
  • Visualize results with interactive maps, charts, and tables
  • Generate reports in multiple formats (Excel, HTML, PNG)
  • Caches data to avoid repeated API calls and speed up analysis

Key Features

NoteCore Capabilities
  • Geocoding addresses to coordinates (and vice versa)
  • Fetching US Census demographic data by location
  • Querying OpenStreetMap for nearby amenities
  • Calculating weighted location scores
  • Creating interactive visualizations
  • Exporting comprehensive reports

Understanding the Code Structure

Architecture Overview

The tool is organized into several functional modules as shown in the following graphic. :

graph TD

A[Input: Property Details] --> B{Lat/Lon Available?}
B -- Yes --> C[Use Existing Coordinates]
B -- No --> D[Geocode Addresses]
C --> E[Data Collection]
D --> E
E --> F[Census Demographics]
E --> G[OSM Amenities]
F --> H[Scoring Engine]
G --> H
H --> I[Analysis Results]
I --> J[Visualizations]
I --> K[Reports]
J --> L[Maps, Charts, Tables]
K --> M[Excel, HTML]

Usage Instructions

Basic Usage

  1. Prepare an Excel file with subject and sales’ locations including these columns as a minimum:
    • Sale: Identifies the sale number or Subject Property
    • Address: Street address of the property
    • City: City property lies in
    • State: State location as two-letter abbreviation
    • Zip Code: Zip Code for property location
    • Price/SF: Price per square foot of property
    • Traffic Count: For now you need to enter this figure from any available source
    • Optional: lat or latitude column. If you have it enter it to avoid geocoding
    • Optional: lon or longitude column. If you have it enter it to avoid geocoding
    • Optional: other identifying columns
  2. Set parameters at the top of this document:
    • data_file: Path to your Excel file
    • shopping_center_type: Type of retail center
    • weights: Set Custom Weights (check params for right label)
  3. Render the document to generate analysis

Setup and Installation

Load Libraries

Install the appropriate libraries (install if necessary)

Show Code Block
library(sf)               # Spatial data handling
library(tidyverse)        # Data manipulation and visualization
library(httr)             # HTTP requests
library(jsonlite)         # JSON parsing
library(tigris)           # Census geographic data
library(tidycensus)       # Census demographic data
library(osmdata)          # OpenStreetMap data
library(leaflet)          # Interactive maps
library(plotly)           # Interactive charts
library(knitr)            # Document generation
library(kableExtra)       # Table formatting
library(writexl)          # Excel export
library(janitor)          # Cleans and formats data frames
library(readxl)           # Read Excel files
library(purrr)            # Run multiple instances to increase speed
library(tibble)           # Create, view and manipulate data frames
library(memoise)          # Adds caching to avoid recomputing
library(dplyr)            # Filter, arrange, summarize, and transform data
library(scales)           # For transforming, formatting, and rescaling numeric data
library(ggdist)           # For Monte Carlo visualizations
library(tmap)             # For pdf maps
library(tidyr)
library(stringr)
library(ggplot2)

options(tigris_use_cache = TRUE)

Memoised Results

This report uses two layers of caching to keep the analysis efficient and reproducible:

  1. File-Based Cache

    The first layer stores previously downloaded data—such as Census demographics or OpenStreetMap amenities—inside the property_analysis_cache/ folder on disk.

    • These files are reused on future runs, so the script doesn’t need to redownload identical data.

    • Cached data automatically expires after a set number of days (default: 60 for demographics, 7 for amenities).

    • You can manually clear this cache by running clear_cache() or deleting the folder.

  2. In-Memory Memoisation

    The second layer (handled by the memoise package) remembers the results of function calls within the current R session.

    • If a function like get_census_demographics() is called again with the same coordinates, it retrieves the stored result from memory instantly rather than reading from disk.

    • This makes iterative runs during development much faster, especially when re-rendering the document multiple times.

To ensure a clean start each time the Quarto document is rendered, the memoised (in-memory) layer is flushed at the beginning of each run. This reset does not delete the underlying .rds files in the file-based cache—it simply forgets any short-term memory from previous interactive sessions.

Parameter Validation

Ensures selected parameters are valid as to shopping center type

Show Code Block
valid_centers <- c(
  "Convenience Store", "Neighborhood Shopping Center",
  "Community Shopping Center", "Regional Shopping Center",
  "Super-Regional Shopping Center"
)

if (!(params$shopping_center_type %in% valid_centers)) {
  stop("Invalid shopping center type. Please use one of the following: ", paste(valid_centers, collapse = ", "))
}

Census Key

Users will need to get a free Census API key at: https://api.census.gov/data/key_signup.html

Note: My census key has been saved as part of the .Renviron file so I don’t have to enter it each time and it is safe from prying eyes. Can be edited through the use of usethis::edit_r_environ(). To retrieve the key, put this is a code block: census_api_key <- Sys.getenv("CENSUS_API_KEY").Sets the Census API Key from the .Renviron file

Show Code Block
# Set your Census API key
census_api_key <- Sys.getenv("CENSUS_API_KEY")
if (census_api_key == "") stop("Missing Census API key. Set it in your .Renviron file.")

Trade Area Radius Helper

Uses preset radii depending on the shopping center type selected in the parameters.

Show Code Block
# Define radius based on shopping center type
get_trade_area_radius <- function(center_type) {
  switch(center_type,
         "Convenience Store" = 1,
         "Neighborhood Shopping Center" = 1.5,
         "Community Shopping Center" = 6,
         "Regional Shopping Center" = 10,
         "Super-Regional Shopping Center" = 35,
         1.5)  # default fallback
}

trade_area_miles <- get_trade_area_radius(params$shopping_center_type)

Functions

Caching Function

The tool includes a smart caching system to avoid redundant API calls and retrieves previously pulled data for use with timed parameters:

TipHow Caching Works
  1. First Query: Data is fetched from APIs and saved to property_analysis_cache/
  2. Subsequent Queries: Data is loaded from cache (much faster!)
  3. Expiration: Demographics and amenities cache expires after 60 days
  4. Cache Key: Based on rounded coordinates (4 decimal places ≈ 11 meters)
  5. Purge Cache?: Call clear_cache() in console
Show Code Block
# ============================================
# CACHING SYSTEM
# ============================================

cache_dir <- "property_analysis_cache"
if (!dir.exists(cache_dir)) {
  dir.create(cache_dir)
}

save_to_cache <- function(data, cache_key, cache_type = "demographics") {
  cache_file <- file.path(cache_dir, paste0(cache_type, "_", cache_key, ".rds"))
  saveRDS(data, cache_file)
  message(paste("Cached to:", cache_file))
}

load_from_cache <- function(cache_key, cache_type = "demographics", max_age_days = 60) {
  cache_file <- file.path(cache_dir, paste0(cache_type, "_", cache_key, ".rds"))
  
  if (file.exists(cache_file)) {
    file_age <- difftime(Sys.time(), file.info(cache_file)$mtime, units = "days")
    
    if (as.numeric(file_age) <= max_age_days) {
      message(paste("Loading from cache:", cache_file))
      return(readRDS(cache_file))
    } else {
      message(paste("Cache expired (", round(file_age, 1), "days old)"))
    }
  }
  return(NULL)
}

clear_cache <- function(cache_type = NULL) {
  if (is.null(cache_type)) {
    files <- list.files(cache_dir, full.names = TRUE)
  } else {
    files <- list.files(cache_dir, pattern = paste0("^", cache_type), full.names = TRUE)
  }
  
  if (length(files) > 0) {
    file.remove(files)
    message(paste("Removed", length(files), "cached files"))
  } else {
    message("No cached files to remove")
  }
}

Geocoding Function

Geocodes properties in the base .xlsx file if file does not already have longitude and latitude coordinates.

Show Code Block
# ============================================
# GEOCODING FUNCTIONS AND EXECUTION
# ============================================

# --- 1. Helper functions -----------------------------------------------------

geocode_address <- function(address) {
  base_url <- "https://geocoding.geo.census.gov/geocoder/locations/onelineaddress"

  response <- httr::GET(
    base_url,
    query = list(
      address = address,
      benchmark = "Public_AR_Current",
      format = "json"
    )
  )

  data <- httr::content(response, as = "parsed")

  if (length(data$result$addressMatches) > 0) {
    coords <- data$result$addressMatches[[1]]$coordinates
    return(c(lat = coords$y, lon = coords$x))
  } else {
    warning(paste("Address not found:", address))
    return(c(lat = NA, lon = NA))
  }
}

reverse_geocode <- function(lat, lon) {
  base_url <- "https://geocoding.geo.census.gov/geocoder/geographies/coordinates"

  response <- httr::GET(
    base_url,
    query = list(
      x = lon,
      y = lat,
      benchmark = "Public_AR_Current",
      vintage = "Current_Current",
      format = "json"
    )
  )

  data <- httr::content(response, as = "parsed")
  return(data$result)
}

# --- 2. Read the Excel file dynamically -------------------------------------

input_data <- read_excel(params$data_file) %>%
  janitor::clean_names()

# --- 3. Detect whether lat/lon already exist --------------------------------

# Normalize column names for flexibility
lat_col <- intersect(names(input_data), c("lat", "latitude"))
lon_col <- intersect(names(input_data), c("lon", "longitude"))

if (length(lat_col) == 1 && length(lon_col) == 1 &&
    all(!is.na(input_data[[lat_col]])) &&
    all(!is.na(input_data[[lon_col]]))) {

  message("Latitude and longitude found in dataset — skipping geocoding.")
  geocoded_results <- input_data %>%
    rename(lat = all_of(lat_col), lon = all_of(lon_col))

} else {
  message("Latitude and longitude not found — performing geocoding...")

  # --- 4. Prepare address strings for geocoding -----------------------------
  if (all(c("address", "city", "state") %in% names(input_data))) {
    input_data <- input_data %>%
      mutate(full_address = paste(address, city, state, sep = ", "))
  } else if ("address" %in% names(input_data)) {
    input_data <- input_data %>%
      mutate(full_address = address)
  } else {
    stop("Excel file must include at least an 'address' column, or 'address', 'city', and 'state'.")
  }

  # --- 5. Apply the geocoder safely -----------------------------------------
  geocode_safely <- purrr::safely(geocode_address)

  geocoded_results <- input_data %>%
    mutate(geo = map(full_address, geocode_safely)) %>%
    mutate(
      lat = map_dbl(geo, ~ .x$result["lat"] %||% NA_real_),
      lon = map_dbl(geo, ~ .x$result["lon"] %||% NA_real_)
    ) %>%
    select(-geo)
}
Table 2: Geocoded Property Results

Demographics Retrieval Function

Fetches demographic data from US Census API:

Variable Description Census Code
Population Total population in census tract B01003_001
Median Income Median household income B19013_001
Median Home Value Median value of owner-occupied homes B25077_001
Median Gross Rent Median gross rent B25046_001
Show Code Block
# ============================================
# DATA COLLECTION FUNCTIONS (Radius-Based, Safe Renaming, No Unemployment)
# ============================================

get_census_demographics <- function(lat, lon,
                                    census_api_key = NULL,
                                    radius_miles = trade_area_miles,
                                    use_cache = TRUE,
                                    cache_max_age = 30) {
  
 cache_key <- paste0(round(lat, 4), "_", round(lon, 4), "_r", radius_miles)

if (use_cache) {
  cached_data <- load_from_cache(cache_key, "demographics", cache_max_age)
  if (!is.null(cached_data)) {
    message("Loaded demographics from cache for ", cache_key)
    return(cached_data)
  }
}

message("Cache not found or expired — fetching new Census data...")


  if (!is.null(census_api_key)) {
    tidycensus::census_api_key(census_api_key, install = FALSE)
  }

  # Build buffer geometry
  point <- sf::st_sfc(sf::st_point(c(lon, lat)), crs = 4326)
  buffer_m <- radius_miles * 1609.34
  buffer_area <- sf::st_transform(point, 3857) |> 
    sf::st_buffer(buffer_m) |> 
    sf::st_transform(4326)

  # ACS variables of interest
  vars <- c(
    population          = "B01003_001",
    median_income       = "B19013_001",
    median_home_value   = "B25077_001",
    median_gross_rent   = "B25046_001"
  )

  tracts_data <- tigris::tracts(cb = TRUE, year = 2023, progress_bar = FALSE) |>
    sf::st_transform(4326)

  tracts_in_area <- suppressWarnings(sf::st_intersection(tracts_data, buffer_area))
  if (nrow(tracts_in_area) == 0) {
    warning("No census tracts found within trade area.")
    return(tibble())
  }

  tract_fips <- unique(paste0(tracts_in_area$STATEFP, tracts_in_area$COUNTYFP))

  message("Downloading ACS data...")
  demo_data_sf <- purrr::map_dfr(
    tract_fips,
    function(fips) {
      tryCatch({
        tidycensus::get_acs(
          geography = "tract",
          variables = vars,
          state = substr(fips, 1, 2),
          county = substr(fips, 3, 5),
          year = 2022,
          survey = "acs5",
          geometry = TRUE
        )
      }, error = function(e) NULL)
    }
  ) |> sf::st_transform(4326)

  if (nrow(demo_data_sf) == 0) {
    warning("No ACS data returned for this area.")
    return(tibble())
  }

  demo_in_area <- suppressWarnings(sf::st_intersection(demo_data_sf, buffer_area))
  if (nrow(demo_in_area) == 0) {
    warning("No demographic data found within buffer.")
    return(tibble())
  }

  # Aggregate results
  demo_summary <- demo_in_area |>
    dplyr::group_by(variable) |>
    dplyr::summarise(estimate = mean(estimate, na.rm = TRUE), .groups = "drop") |>
    tidyr::pivot_wider(names_from = variable, values_from = estimate)

  # --- Safe renaming: only rename if columns exist ---------------------------
  rename_map <- c(
    B01003_001 = "population",
    B19013_001 = "median_income",
    B25077_001 = "median_home_value",
    B25046_001 = "median_gross_rent"
  )

  rename_map <- rename_map[names(rename_map) %in% names(demo_summary)]
  demo_summary <- dplyr::rename(demo_summary, !!!rename_map)

  demo_summary <- demo_summary |>
    dplyr::mutate(radius_miles = radius_miles)

  if (use_cache) save_to_cache(demo_summary, cache_key, "demographics")

  return(demo_summary)
}

# Memoised version for fast in-session reuse
get_census_demographics_memo <- memoise(get_census_demographics)

Amenity Function

Queries OpenStreetMap (https://wiki.openstreetmap.org/wiki/Key:amenity) for nearby amenities within specified radius (1.5 miles for this analysis). OSM’s amenity key is extremely broad. It includes everything from airports to post boxes.

In the context of retail demand and site selection, the relevant amenities are those that:

  1. Attract regular consumer visits (e.g., restaurants, cafes, banks, pharmacies).

  2. Represent daily or weekly necessities (e.g., supermarkets, convenience stores, schools).

  3. Encourage dwell time or foot traffic clustering (e.g., parks, places of worship, leisure).

  4. Support workforce presence (e.g., offices, post offices, government buildings).

These amenities serve as proxies for consumer presence, disposable income circulation, and accessibility to routine goods and services. For a retail valuation context, they also mirror the “market support” and “trade area vitality” concepts used in market analysis texts like Fanning, Market Analysis for Real Estate, 2nd ed.

The general amenity categories that are used in this analysis include the following:

  • Supermarkets - As a proxy for measuring the presence of food retail and daily household shopping activity.
  • Restaurants - Correlates to consumer engagement, social clustering and evening/weekend traffic.
  • Cafes - Also correlates to consumer engagement, social clustering and evening/weekend traffic.
  • Banks - Usually daytime traffic and draw to other nearby retail establishments.
  • Hospitals - Represents essential service and consistent daytime population.
  • Pharmacies - Regular essential service that often reflects high traffic.
  • School/College/University - Brings regular population flows including students, parents, and staff.
  • Parks - Encourages recreational foot traffic and neighborhood appeal.
Show Code Block
# ============================================
# IMPROVED AMENITY COLLECTION FUNCTION
# ============================================

find_nearby_amenities <- function(lat, lon,
                                  amenity_types = params$amenity_types,
                                  radius_miles = trade_area_miles,
                                  use_cache = TRUE,
                                  cache_max_age = 7) {

  # Fallback list
  if (is.null(amenity_types) || length(amenity_types) == 0) {
    amenity_types <- c("school", "college", "university", "restaurant", "cafe",
                       "bank", "hospital", "pharmacy", "supermarket", "park")
  }

  cache_key <- paste0(round(lat, 4), "_", round(lon, 4), "_r", radius_miles)
  if (use_cache) {
    cached_data <- load_from_cache(cache_key, "amenities", cache_max_age)
    if (!is.null(cached_data)) return(cached_data)
  }

  radius_m <- radius_miles * 1609.34
  bbox_buffer <- radius_m / 111320  # convert meters to degrees (approx)
  bbox <- c(lon - bbox_buffer, lat - bbox_buffer, lon + bbox_buffer, lat + bbox_buffer)

  results <- tibble(amenity = character(), count = numeric())
  target <- sf::st_sfc(sf::st_point(c(lon, lat)), crs = 4326) |> sf::st_transform(3857)

  for (a in amenity_types) {
    tryCatch({
      # Choose appropriate key depending on amenity
      key <- dplyr::case_when(
        a == "supermarket" ~ "shop",
        a == "park" ~ "leisure",
        TRUE ~ "amenity"
      )

      query <- osmdata::opq(bbox = bbox) %>%
        osmdata::add_osm_feature(key = key, value = a)

      osm_data <- osmdata::osmdata_sf(query)
      count <- 0

      # Combine points and polygons
      all_geoms <- list(osm_data$osm_points, osm_data$osm_polygons)
      all_geoms <- all_geoms[!vapply(all_geoms, is.null, logical(1))]

      if (length(all_geoms) > 0) {
        all_sf <- dplyr::bind_rows(lapply(all_geoms, function(g) {
          sf::st_centroid(sf::st_transform(g, 3857))
        }))
        dists <- sf::st_distance(all_sf, target)
        count <- sum(as.numeric(dists) <= radius_m)
      }

      results <- dplyr::add_row(results, amenity = a, count = count)
      Sys.sleep(1)

    }, error = function(e) {
      warning(paste("Error fetching", a, ":", e$message))
      results <<- dplyr::add_row(results, amenity = a, count = 0)
    })
  }

  if (use_cache) save_to_cache(results, cache_key, "amenities")
  return(results)
}

# Memoised version
find_nearby_amenities_memo <- memoise::memoise(find_nearby_amenities)
Table 3: Amenity Counts by Property
Sale Banks Cafes Colleges Hospitals Parks Pharmacies Restaurants Schools Supermarkets Universities Total
1 53 2 0 0 427 3 66 48 52 0 651
2 26 9 0 0 29 0 18 16 20 0 118
3 84 1 0 20 63 2 193 37 46 1 447
4 3 15 5 14 283 4 233 36 33 0 626
5 28 1 0 0 146 1 50 190 10 0 426
6 114 2 0 6 290 9 71 88 50 0 630
7 30 0 0 0 9 0 37 10 0 0 86
8 61 4 0 15 216 2 107 34 90 0 529
Subject 0 0 0 0 467 18 6 66 19 45 621

Traffic Count Information

A user would need to add the traffic counts to their base Excel file.

NoteTraffic Count Data Sources

Traffic count data measure the average number of vehicles passing specific roadway points, often expressed as Annual Average Daily Traffic (AADT). While this report uses data from the Texas Department of Transportation (TxDOT) Open Data Portal, similar information is available from most U.S. states and many other countries.

United States

Canada

  • Provincial Ministries of Transportation:
    Each province maintains similar datasets:
  • Municipal and Regional Sources:
    Large municipalities (e.g., Toronto, Vancouver, Calgary) often publish local traffic counts through their open data portals.

International

Future Implementation Notes

To integrate local traffic data into this analysis: 1. Download the shapefile or GeoJSON version of your region’s AADT or traffic count dataset.
2. Load it into R using sf::st_read("path/to/file.shp").
3. Perform a spatial join between your property points and the nearest road segment (sf::st_join() or st_nearest_feature()).
4. Extract or average the relevant traffic count field for scoring.

Weighting Function/Inputs

Required inputs to apply appropriate weight to the income (consumer spending power), the population density (a reflection of foot traffic potential), amenities (neighborhood quality and drawing power), property values (a function of costs to live in area), and traffic counts (a measure of visibility, accessibility and potential customers beyond population in trade area).

Scoring Function

Calculates a comprehensive location score (0-100) based on five factors:

  • Formula: rescale(median_income, to = c(0, 100), from = range(all_incomes))
  • Higher income = better location (greater purchasing power)
  • Dyanmically normalized to 0-100 scale

Interpretation: The Income Score measures the relative purchasing power of households within each trade area. Median household income remains a reliable proxy for local spending capacity and the ability of residents to support a broad range of retail goods and services. Higher median incomes typically correspond with greater discretionary spending, stronger support for mid- and upper-tier retailers, and enhanced resilience during economic downturns.

This uses adaptive scaling to account for any range of income levels. The score is calculated by rescaling each trade area’s median income within the range of all incomes observed across the study sample. This approach preserves meaningful differences among higher-income areas, where a capped formula would otherwise treat them as identical, and allows for balanced comparison between affluent and moderate-income markets.

Notes on the Scale: The scaling is anchored to the lowest and highest median incomes within the comparison group. The location with the highest income receives a score of 100, while lowest income area scores a 0. Other income levels are scored proportionally.

  • Formula: rescale(population, to = c(0, 100), from = range(all_populations))
  • Higher population = more potential customers/activity
  • Normalized to 0-100 scale

Interpretation: The Population Density Score represents the concentration of potential customers within the trade area. Retail activity depends not only on affluence but also on proximity and repetition of visits. Areas with greater population density tend to generate stronger pass-by traffic, higher visibility, and more consistent sales volumes.

This metric applies adaptive scaling rather than a fixed divisor. Each trade area’s population is rescaled within the observed range of populations across the study sample. This ensures that every location receives a proportionate score, maintaining meaningful separation even among densely populated areas where a capped formula might otherwise flatten results.

Notes on the Scale: The lowest population in the comparison group anchors the bottom of the scale at 0, while the highest defines 100. All other sites are positioned proportionally between them.

  • Formula: 0.4 × (Diversity %) + 0.6 × (Density %, adaptively scaled 0–100)
  • More diverse and concentrated amenities = stronger location
  • Balances variety and overall intensity of surrounding uses

Interpretation: The Amenities Score reflects both the diversity and density of nearby amenities that contribute to a site’s retail vitality. Diversity measures the proportion of tracked amenity categories—such as restaurants, schools, parks, banks, and pharmacies—that are represented within the trade area. Density measures the total number of amenities, scaled relative to the range observed across all study locations.

Together, these components create a balanced indicator of neighborhood convenience, consumer draw, and the underlying strength of the retail ecosystem. Locations that contain all tracked amenity types receive full credit for diversity, while those with fewer categories present earn proportionally less. The density component differentiates sites based on how intensively built and commercially active they are within the same amenity mix.

Notes on the Scale: The diversity index reaches 100 when all tracked amenity categories are represented within the trade area. The density index is adaptively normalized between the lowest and highest total amenity counts observed among all properties being compared, ensuring that each site’s score reflects its position within the actual range of market activity. The final score combines both effects, assigning 40% weight to diversity and 60% to density. This method ensures fair comparison across urban, suburban, and rural contexts without relying on arbitrary saturation thresholds.

  • Formula: 0.50 x min(median_home_value / 5000, 100) + 0.5 x min(median_gross_rent / 30, 100)
  • Higher values = more desirable neighborhood
  • Normalized to 0-100 scale

Interpretation: The Property Value Score represents a composite measure of market vitality that combines indicators of long-term stability and short-term spending capacity within a trade area. Median home value reflects neighborhood affluence, ownership stability, and the general desirability of the surrounding residential market. Median gross rent complements this by capturing the relative cost of occupancy and the local balance between household income and housing expense—an important proxy for retail purchasing power and economic elasticity.

By weighting home value and rent equally (50/50), the score avoids bias toward either high-income owner-occupied areas or transient high-rent districts. A higher Property Value Score therefore signals a trade area that is both financially resilient and economically active, typically corresponding to locations that can support sustainable retail rents and consistent consumer demand.

High home value and high rent → strong, affluent market with high retail viability.
High rent but low home value → possibly transient or overburdened market.
High home value but low rent → stable ownership but low circulation spending.

Notes on the Scale” The divisor for home value (/5000) is a normalization factor that caps high-value markets near 100. You can tune it—if your markets are typically high-end, you might raise that to /6000 or /8000. The divisor for gross rent (/30) is similar: $30 × 100 = $3,000, roughly the upper bound for typical monthly rent, producing comparable scaling.

  • Formula: rescale(traffic_count, to = c(0, 100), from = range(all_traffic_counts))
  • Higher traffic = greater retail visibility and consumer flow
  • Dynamically normalized to 0–100 scale

Interpretation: The Traffic Count Score measures the retail exposure and accessibility of a location based on roadway volume. Higher traffic counts typically translate to greater visibility, impulse visits, and daily customer flow.

This score is dynamically scaled relative to the comparison group, ensuring that properties with exceptionally high or low counts remain distinguishable within each analysis set.

Notes on the Scale: The lowest observed traffic count anchors the scale at 0, and the highest at 100. When comparing sites across multiple markets, a shared range can be applied to maintain consistent interpretation across studies.

Total Score Calculation \[ \text{Total Score} = \sum_{i=1}^{5} (\text{Component Score}_i \times \text{Weight}_i) \]

Interpretation: The Total Score represents the weighted sum of five core indicators—Income, Population, Amenities, Property Value, and Traffic Count—each scaled from 0 to 100. Weights (𝑤ᵢ) control the relative importance of each factor and are defined in the document parameters. This additive framework allows the model to reflect both market potential (income, population), site quality (amenities, property value), and exposure/accessibility (traffic count).

Notes on the Scale: Each component score is dynamically normalized within its observed range to preserve proportional differences among locations. The composite score therefore expresses each site’s overall retail viability relative to all others in the analysis.

Analysis Function

Main function that ties everything together:

  1. Geocodes address (if not provided in Excel file)
  2. Fetches demographics
  3. Queries amenities
  4. Calculates scores
  5. Returns comprehensive results object
Show Code Block
# ============================================
# BATCH COMPARISON — FULLY NORMALIZED
# ============================================

compare_locations_from_excel <- function(
  data_file = params$data_file,
  census_api_key = Sys.getenv("CENSUS_API_KEY"),
  center_type = params$shopping_center_type,
  radius_miles = trade_area_miles
) {
  # Always pull weights from YAML
  weights <- unlist(params$weights$value)

  
  message("-----------------------------------------------------")
  message("Running Batch Location Comparison")
  message("Center Type: ", center_type)
  message("Trade Area Radius: ", radius_miles, " miles")
  message("Source File: ", data_file)
  message("-----------------------------------------------------")
  
  # --- Load data ---
  df <- readxl::read_excel(data_file) |>
    janitor::clean_names()
  
  if (!"traffic_count" %in% names(df)) {
    df$traffic_count <- NA_real_
  }
  
  lat_col <- intersect(names(df), c("lat", "latitude"))
  lon_col <- intersect(names(df), c("lon", "longitude"))
  
  if (length(lat_col) != 1 || length(lon_col) != 1) {
    stop("Excel file must include latitude and longitude columns.")
  }
  
  df <- df |>
    dplyr::rename(lat = all_of(lat_col), lon = all_of(lon_col)) |>
    dplyr::filter(!is.na(lat) & !is.na(lon))
  
  # --- Compute traffic range globally ---
  traffic_range <- range(df$traffic_count, na.rm = TRUE)
  
  # --- Gather demographic + amenity data for global scaling ---
  demo_list <- list()
  amenity_list <- list()
  
  for (i in seq_len(nrow(df))) {
    lat <- df$lat[i]
    lon <- df$lon[i]
    
    demo <- tryCatch(
      get_census_demographics(lat, lon, census_api_key, radius_miles = radius_miles),
      error = function(e) tibble()
    )
    if (nrow(demo) > 0) demo_list[[i]] <- demo
    
    amen <- tryCatch(
      find_nearby_amenities(lat, lon, radius_miles = radius_miles),
      error = function(e) tibble()
    )
    if (nrow(amen) > 0) amenity_list[[i]] <- amen
  }
  
  all_demo <- bind_rows(demo_list)
  income_range <- range(all_demo$median_income, na.rm = TRUE)
  population_range <- range(all_demo$population, na.rm = TRUE)
  home_value_range <- range(all_demo$median_home_value, na.rm = TRUE)
  rent_range <- range(all_demo$median_gross_rent, na.rm = TRUE)
  
  amenity_summary <- purrr::map_dfr(amenity_list, function(a) {
    tibble(diversity = length(unique(a$amenity)),
           density = sum(a$count, na.rm = TRUE))
  })
  div_range <- range(amenity_summary$diversity, na.rm = TRUE)
  den_range <- range(amenity_summary$density, na.rm = TRUE)
  
  # --- Run analysis for each property ---
  results <- purrr::pmap_dfr(
    df,
    function(...) {
      row <- tibble::tibble(...)
      message("Analyzing location: ", row$lat, ", ", row$lon)
      
      analysis <- calculate_location_score(
        lat = row$lat,
        lon = row$lon,
        traffic_count = row$traffic_count,
        census_api_key = census_api_key,
        radius_miles = radius_miles,
        income_range = income_range,
        population_range = population_range,
        traffic_range = traffic_range,
        home_value_range = home_value_range,
        rent_range = rent_range,
        amenity_div_range = div_range,
        amenity_den_range = den_range
      )
      
      tibble::tibble(
  property_id = if ("id" %in% names(row)) row$id else NA_character_,
  address = if ("address" %in% names(row)) row$address else paste0("(", round(row$lat, 4), ", ", round(row$lon, 4), ")"),
  lat = row$lat,
  lon = row$lon,
  total_score = analysis$total_score,
  income_score = analysis$individual_scores$income,
  population_score = analysis$individual_scores$population_density,
  amenities_score = analysis$individual_scores$amenities,
  property_value_score = analysis$individual_scores$property_value,
  traffic_score = analysis$individual_scores$traffic_count
)

    }
  )
  
  # --- Rank and export ---
  ranked_results <- results |>
    dplyr::arrange(desc(total_score)) |>
    dplyr::mutate(rank = dplyr::row_number())
  
  output_file <- paste0("Location_Scoring_", gsub(" ", "_", center_type), "_", radius_miles, "mi.xlsx")
  writexl::write_xlsx(ranked_results, output_file)
  
  message("Results saved to: ", output_file)
  message("-----------------------------------------------------")
  
  return(ranked_results)
}
Table 4: Ranked Retail Locations
Ranked Retail Locations
Rank Sale Total Score Income Score Population Score Amenities Score Property Value Score Traffic Score
1 1 74.3 25.8 100.0 100.0 65.2 100.0
2 Subject 67.5 100.0 61.5 96.8 73.6 10.0
3 7 54.5 56.3 57.2 40.0 91.1 36.7
4 2 46.0 25.8 99.8 43.4 39.3 0.0
5 6 42.0 34.6 43.6 97.8 56.2 15.5
6 8 41.3 27.2 51.6 87.0 43.1 23.2
7 4 32.0 35.5 0.0 97.3 59.6 28.5
8 5 28.8 0.0 9.0 76.1 14.7 85.1
9 3 21.6 12.6 19.8 78.3 13.5 13.5

Create Combined Results Export

This will create a new Excel file with all collected inforamtion.

Show Code Block
# =====================================================
# EXPORT COMPREHENSIVE RESULTS (Failsafe Version)
# =====================================================

safely_get_col <- function(df, patterns) {
  cols <- names(df)
  match <- cols[grepl(paste(patterns, collapse = "|"), cols, ignore.case = TRUE)]
  if (length(match) > 0) return(match[1])
  return(NULL)
}

# --- 1. Read and clean the base Excel file ---
original_data <- readxl::read_excel(params$data_file) %>%
  janitor::clean_names()

# --- 2. Detect or create address column ---
address_col <- safely_get_col(original_data, c("^address$", "addr", "site", "property"))
if (!is.null(address_col)) {
  original_data <- original_data %>%
    rename(address = !!sym(address_col))
  message("Detected address column: ", address_col)
} else {
  message("⚠️  No address column detected. Creating synthetic labels.")
  original_data <- original_data %>%
    mutate(address = paste0("Site_", row_number()))
}

# --- 3. Ensure lat/lon columns exist ---
lat_col <- safely_get_col(original_data, c("^lat$", "latitude"))
lon_col <- safely_get_col(original_data, c("^lon$", "longitude"))
if (!is.null(lat_col)) original_data <- rename(original_data, lat = !!sym(lat_col)) else original_data$lat <- NA_real_
if (!is.null(lon_col)) original_data <- rename(original_data, lon = !!sym(lon_col)) else original_data$lon <- NA_real_

# --- 4. Add geocoded coordinates if missing ---
if (exists("geocoded_results")) {
  original_data <- original_data %>%
    mutate(addr_key = str_trim(tolower(address))) %>%
    left_join(
      geocoded_results %>%
        mutate(addr_key = str_trim(tolower(address))) %>%
        select(addr_key, lat_geo = lat, lon_geo = lon),
      by = "addr_key"
    ) %>%
    mutate(
      lat = coalesce(lat, lat_geo),
      lon = coalesce(lon, lon_geo)
    ) %>%
    select(-addr_key, -lat_geo, -lon_geo)
}

# --- 5. Build amenity summary safely ---
if (exists("amenity_results")) {
  amenity_summary_join <- amenity_results %>%
    mutate(addr_key = str_trim(tolower(address))) %>%
    group_by(addr_key, amenity) %>%
    summarise(total = sum(count, na.rm = TRUE), .groups = "drop") %>%
    tidyr::pivot_wider(
      names_from = amenity,
      values_from = total,
      values_fill = list(total = 0)
    )
} else {
  amenity_summary_join <- tibble(addr_key = character())
}

# --- 6. Gather raw demographics ---
demo_list <- purrr::map2_dfr(
  original_data$lat, original_data$lon,
  function(lat, lon) {
    if (is.na(lat) || is.na(lon)) return(tibble())
    d <- get_census_demographics(lat, lon, census_api_key = Sys.getenv("CENSUS_API_KEY"))
    if (nrow(d) == 0) return(tibble())
    d$lat <- lat; d$lon <- lon
    d
  }
)

demo_join <- demo_list %>%
  mutate(lat_r = round(lat, 4), lon_r = round(lon, 4)) %>%
  select(lat_r, lon_r, population, median_income, median_home_value, median_gross_rent)

# --- 7. Merge everything together ---
final_export <- location_scores %>%
  mutate(addr_key = str_trim(tolower(address))) %>%
  left_join(amenity_summary_join, by = "addr_key") %>%
  mutate(lat_r = round(lat, 4), lon_r = round(lon, 4)) %>%
  left_join(demo_join, by = c("lat_r", "lon_r")) %>%
  left_join(
    original_data %>%
      mutate(addr_key = str_trim(tolower(address))) %>%
      select(-lat, -lon),
    by = "addr_key"
  )

# --- 8. Ensure essential columns exist ---
for (col in c("address", "lat", "lon", "population", "median_income",
              "median_home_value", "median_gross_rent")) {
  if (!col %in% names(final_export)) final_export[[col]] <- NA
}

# --- 9. Reorder logical columns ---
final_export <- final_export %>%
  select(
    address,
    lat, lon,
    population, median_income, median_home_value, median_gross_rent,
    total_score, income_score, population_score,
    amenities_score, property_value_score, traffic_score,
    everything()
  )

# --- 10. Write to Excel ---
output_file <- paste0(
  "Comprehensive_Location_Results_",
  gsub(" ", "_", params$shopping_center_type),
  "_", trade_area_miles, "mi.xlsx"
)

writexl::write_xlsx(final_export, output_file)

Sensivity Analysis

The sensitivity analysis tests how stable each property’s ranking remains when the weighting of input factors such as income, population density, amenities, property values, and traffic counts is allowed to vary randomly within reasonable limits. This scenario analysis runs 5,000 times with alternative weighting configuration being simulated. Each configuration produces a new set of location rankings. The resulting distribution of simulated ranks for each property reflects how dependent its position is on the specific weighting assumptions used in the model.

The median simulated rank represents the property’s typical performance across all simulations, while the 5th and 95th percentiles mark its best-case and worst-case outcomes, respectively. A property with a low median rank and a narrow percentile range demonstrates consistent strength and resilience to changing assumptions. In contrast, a wide percentile range indicates that the property’s relative standing shifts considerably under different weighting scenarios, suggesting greater sensitivity to how the model prioritizes its components. Together, these measures reveal not only which locations perform best on average but also which are most robust to uncertainty in the underlying valuation logic.

NoteAbout the Sensitivity Simulation

The “reasonable limits” for weighting variation are defined by a distribution, which produces random, positive weights that sum to one. This distribution allows each factor including Income, Population, Amenities, Property Value, and Traffic to fluctuate broadly between near-zero and moderate dominance, representing plausible shifts in judgment without introducing unrealistic extremes.Each factor can vary between very small (near 0) and fairly large (up to ~0.7–0.8), but the average weight remains around 0.20 (1/5). The Gamma(1,1) shape parameter ensures broad variability without producing absurd or hyper-concentrated weights (like one factor getting 99% of total influence).

Each Monte Carlo run represents one hypothetical weighting configuration within those bounds. Monte Carlo error decreases approximately in proportion to the inverse of the square root of the number of simulations:

\[ \text{Error} \propto \frac{1}{\sqrt{N}} \]

This means that as the number of scenarios (N) increases, the uncertainty in the simulated results shrinks rapidly at first and then levels off:

  • 100 runs → approximately 10 % uncertainty
  • 1,000 runs → approximately 3 % uncertainty
  • 5,000 runs → approximately 1.4 % uncertainty

In practice, 5,000 simulations represent a balanced choice—large enough to yield stable percentile and rank estimates, yet small enough to run efficiently on a standard laptop without noticeable slowdown.

(a) Dot = Median Simulated Rank, Line = 5th–95th Percentile
Figure 1: Sensitivity Analysis: Ranking Stability Under Weight Variation
Table 5: Simulated Ranking Stability Summary
Sale Median Rank 5th Percentile 95th Percentile
1 1.0 1.0 1.5
Subject 2.0 1.4 4.0
7 3.0 2.5 5.1
8 5.0 5.0 6.5
6 5.5 4.0 6.5
2 6.0 2.5 9.0
4 7.0 3.5 7.5
5 8.0 2.5 9.0
3 9.0 7.4 9.0

Ranked Location Map

Figure 2

Figure 2: Ranked Location Map by Total Score.

Radar Chart

Figure 3: Component Comparisons Radar Chart

Weighted Sensivity Heatmap

Figure 4: Weighted Factor Sensitivity Heatmap

Amenity Density Map

Still needs work

Final Summary Table

Comparative analysis is defined as “the process by which a value indication is derived in the sales comparison approach. Comparative analysis may employ quantitative or qualitative techniques, either separately or in combination.”1 The Appraisal of Real Estate outlines examples of the techniques used in quantitative adjustments and qualitative analyses as shown in Table 6.2

Table 6: Techniques Used in Quantitative and Qualitative Analysis
Quantitative Analysis Qualitative Analysis
Paired data analysis (sales and resales of the same or similar properties) Relative comparison analysis
Grouped data analysis Ranking analysis
Secondary data analysis Personal interviews
Statistical analysis including graphic analysis and scenario analysis
Cost-related adjustments (cost to cure, depreciated cost)
Capitalization of income differences
Trend analysis

When adequate data is available, quantitative analysis can be a helpful tool, but often the differences between properties makes it challenging to determine appropriate adjustments with a level of certainty. In these cases qualitative analysis or ranking can be helpful.”3 Qualitative analysis recognizes the inefficiencies of real estate markets and the difficulty of expressing adjustments with mathematical precision.”

The analysis to this point has described in detail how the properties analysed for location attributes contribute to retail property success. As shown in Table 7, the subject property is near the top of the ranking analysis suggesting a value at or below Sale 1 and at or above the remaining sales.

Table 7: Executive Summary — Retail Location Comparison Summary
Rank Sale Total Score Price/SF
1 1 74.3 $451
2 Subject 67.5
3 7 54.5 $308
4 2 46.0 $428
5 6 42.0 $392
6 8 41.3 $281
7 4 32.0 $391
8 5 28.8 $240
9 3 21.6 $115

Footnotes

  1. The Dictionary of Real Estate Appraisal, 7th ed. (Chicago: Appraisal Institute, 2022), s.v. “comparative analysis.”↩︎

  2. The Appraisal of Real Estate, 15th ed. (Chicago: Appraisal Institute, 2020), 362.↩︎

  3. Appraisal of Real Estate, 15th ed. , 376.↩︎