Retail Location Analysis

Author

Jim Amorin, CAE, MAI, SRA, AI-GRS, CDEI

Published

October 24, 2025

Introduction

This guide provides a comprehensive walk through of the Property Location Analysis Tool, an R-based system for evaluating real estate locations using geospatial data, demographics, and amenity analysis.

Location Adjustments in Retail Property Appraisal

In the valuation of retail properties, location remains one of the most influential determinants of value. Location adjustments account for the qualitative and quantitative differences in the site characteristics that influence a property’s utility, visibility, and access to a viable customer base. These adjustments are particularly critical when analyzing comparable sales in varying retail environments.

Retail success is closely tied to the property’s trade area or that geographic region from which a retail establishment draws the majority of its customers. The size and shape of a trade area are rarely uniform and are influenced by multiple factors including existing and proposed residential developments, natural or manmade barriers (e.g., rivers, highways), and psychological boundaries such as perceptions of safety. Transportation infrastructure and commuter patterns further shape trade area dynamics by affecting ease of access for potential shoppers.

Demographic variables within a trade area such as population density, average household income, age distribution, and consumer spending patterns directly impact retail viability. For instance, household income levels and consumer expenditure habits (e.g., on groceries, apparel, or electronics) can indicate the buying power and preferences of the local market. Retailers often rely on these data points to determine store format, product mix, and even pricing strategies.

Retail property buyers and appraisers must also evaluate the visibility and accessibility of a site. Elements such as traffic volume, street frontage, line-of-sight from major roads, and access to public transportation all contribute to a location’s competitive advantage. A site that is difficult to locate or reach whether due to excessive traffic speeds, poor signage, or other accessibility issues can significantly underperform in comparison to more accessible alternatives.

Ultimately, effective location adjustments require an understanding not only of physical proximity but of the socio-economic context within which a retail property operates. By carefully analyzing trade area boundaries, demographic trends, and site-specific factors, appraisers can make more precise adjustments, thereby producing credible and supportable value conclusions.

Table 1 below outlines some of the key characteristics of various shopping center types.

Table 1: Type of Shopping Centers

Type	Tenantry	Size	Primary Trade Area
Convenience Center	Stores that sell convenience goods (e.g., groceries, pharmaceuticals); not anchored by a supermarket.	Less than 30,000 sq ft	Less than 5-minute driving time
Neighborhood Shopping Center	Stores that sell convenience goods and personal services (e.g., dry cleaning, shoe repair); often anchored by a supermarket.	30,000 – 150,000 sq ft of gross leasable area; 4 – 10 acres	Less than 5-minute driving time; 1 – 1½-mile range; 5,000 – 40,000 potential customers
Community Shopping Center	Stores that sell convenience goods, personal services, and shoppers’ goods (e.g., apparel, appliances); may include a junior department store or off-price/discount store.	100,000 – 300,000 sq ft of gross leasable area; 10 – 30 acres	5 – 20-minute driving time; 3 – 6-mile range; 40,000 – 150,000 potential customers
Regional Shopping Center	Stores that sell general merchandise, shoppers’ goods, and convenience goods; typically includes one or more department stores.	300,000 – 1,000,000 sq ft of gross leasable area; 30 – 100 acres	20 – 40-minute driving time; 5 – 10-mile range; 150,000 – 400,000 potential customers
Super-Regional Shopping Center	Stores that sell general merchandise, apparel, furniture, home furnishings, services, and recreation; contains at least three major department stores.	Over 800,000 sq ft of gross leasable area	In excess of 30-minute driving time; typically 10 – 35-mile range; over 500,000 potential customers

Source: Stephen F. Fanning, Market Analysis for Real Estate, 2nd ed. (Chicago: Appraisal Institute, 2014)

What This Tool Does

The Property Location Analysis Tool helps you determine a ranking of the location of comparables on a qualitative basis using a scoring of key elements:

Provides an Input Section to set parameters for the property type, weightings, etc.
Score locations based on multiple factors (income, population, amenities, property values, and traffic counts)
Compare properties side-by-side to identify the best opportunities
Visualize results with interactive maps, charts, and tables
Generate reports in multiple formats (Excel, HTML, PNG)
Caches data to avoid repeated API calls and speed up analysis

Key Features

Core Capabilities

Geocoding addresses to coordinates (and vice versa)
Fetching US Census demographic data by location
Querying OpenStreetMap for nearby amenities
Calculating weighted location scores
Creating interactive visualizations
Exporting comprehensive reports

Understanding the Code Structure

Architecture Overview

The tool is organized into several functional modules as shown in the following graphic. :

graph TD

A[Input: Property Details] --> B{Lat/Lon Available?}
B -- Yes --> C[Use Existing Coordinates]
B -- No --> D[Geocode Addresses]
C --> E[Data Collection]
D --> E
E --> F[Census Demographics]
E --> G[OSM Amenities]
F --> H[Scoring Engine]
G --> H
H --> I[Analysis Results]
I --> J[Visualizations]
I --> K[Reports]
J --> L[Maps, Charts, Tables]
K --> M[Excel, HTML]

Usage Instructions

Basic Usage

Prepare an Excel file with subject and sales’ locations including these columns as a minimum:
- Sale: Identifies the sale number or Subject Property
- Address: Street address of the property
- City: City property lies in
- State: State location as two-letter abbreviation
- Zip Code: Zip Code for property location
- Price/SF: Price per square foot of property
- Traffic Count: For now you need to enter this figure from any available source
- Optional: lat or latitude column. If you have it enter it to avoid geocoding
- Optional: lon or longitude column. If you have it enter it to avoid geocoding
- Optional: other identifying columns
Set parameters at the top of this document:
- data_file: Path to your Excel file
- shopping_center_type: Type of retail center
- weights: Set Custom Weights (check params for right label)
Render the document to generate analysis

Setup and Installation

Load Libraries

Install the appropriate libraries (install if necessary)

Show Code Block

library(sf)               # Spatial data handling
library(tidyverse)        # Data manipulation and visualization
library(httr)             # HTTP requests
library(jsonlite)         # JSON parsing
library(tigris)           # Census geographic data
library(tidycensus)       # Census demographic data
library(osmdata)          # OpenStreetMap data
library(leaflet)          # Interactive maps
library(plotly)           # Interactive charts
library(knitr)            # Document generation
library(kableExtra)       # Table formatting
library(writexl)          # Excel export
library(janitor)          # Cleans and formats data frames
library(readxl)           # Read Excel files
library(purrr)            # Run multiple instances to increase speed
library(tibble)           # Create, view and manipulate data frames
library(memoise)          # Adds caching to avoid recomputing
library(dplyr)            # Filter, arrange, summarize, and transform data
library(scales)           # For transforming, formatting, and rescaling numeric data
library(ggdist)           # For Monte Carlo visualizations
library(tmap)             # For pdf maps
library(tidyr)
library(stringr)
library(ggplot2)

options(tigris_use_cache = TRUE)

Memoised Results

This report uses two layers of caching to keep the analysis efficient and reproducible:

File-Based Cache

The first layer stores previously downloaded data—such as Census demographics or OpenStreetMap amenities—inside the property_analysis_cache/ folder on disk.
- These files are reused on future runs, so the script doesn’t need to redownload identical data.
- Cached data automatically expires after a set number of days (default: 60 for demographics, 7 for amenities).
- You can manually clear this cache by running clear_cache() or deleting the folder.
In-Memory Memoisation

The second layer (handled by the memoise package) remembers the results of function calls within the current R session.
- If a function like get_census_demographics() is called again with the same coordinates, it retrieves the stored result from memory instantly rather than reading from disk.
- This makes iterative runs during development much faster, especially when re-rendering the document multiple times.

To ensure a clean start each time the Quarto document is rendered, the memoised (in-memory) layer is flushed at the beginning of each run. This reset does not delete the underlying .rds files in the file-based cache—it simply forgets any short-term memory from previous interactive sessions.

Parameter Validation

Ensures selected parameters are valid as to shopping center type

Show Code Block

valid_centers <- c(
  "Convenience Store", "Neighborhood Shopping Center",
  "Community Shopping Center", "Regional Shopping Center",
  "Super-Regional Shopping Center"
)

if (!(params$shopping_center_type %in% valid_centers)) {
  stop("Invalid shopping center type. Please use one of the following: ", paste(valid_centers, collapse = ", "))
}

Census Key

Users will need to get a free Census API key at: https://api.census.gov/data/key_signup.html

Note: My census key has been saved as part of the .Renviron file so I don’t have to enter it each time and it is safe from prying eyes. Can be edited through the use of usethis::edit_r_environ(). To retrieve the key, put this is a code block: census_api_key <- Sys.getenv("CENSUS_API_KEY").Sets the Census API Key from the .Renviron file

Show Code Block

# Set your Census API key
census_api_key <- Sys.getenv("CENSUS_API_KEY")
if (census_api_key == "") stop("Missing Census API key. Set it in your .Renviron file.")

Trade Area Radius Helper

Uses preset radii depending on the shopping center type selected in the parameters.

Show Code Block

# Define radius based on shopping center type
get_trade_area_radius <- function(center_type) {
  switch(center_type,
         "Convenience Store" = 1,
         "Neighborhood Shopping Center" = 1.5,
         "Community Shopping Center" = 6,
         "Regional Shopping Center" = 10,
         "Super-Regional Shopping Center" = 35,
         1.5)  # default fallback
}

trade_area_miles <- get_trade_area_radius(params$shopping_center_type)

Functions

Caching Function

The tool includes a smart caching system to avoid redundant API calls and retrieves previously pulled data for use with timed parameters:

How Caching Works

First Query: Data is fetched from APIs and saved to property_analysis_cache/
Subsequent Queries: Data is loaded from cache (much faster!)
Expiration: Demographics and amenities cache expires after 60 days
Cache Key: Based on rounded coordinates (4 decimal places ≈ 11 meters)
Purge Cache?: Call clear_cache() in console

Show Code Block

# ============================================
# CACHING SYSTEM
# ============================================

cache_dir <- "property_analysis_cache"
if (!dir.exists(cache_dir)) {
  dir.create(cache_dir)
}

save_to_cache <- function(data, cache_key, cache_type = "demographics") {
  cache_file <- file.path(cache_dir, paste0(cache_type, "_", cache_key, ".rds"))
  saveRDS(data, cache_file)
  message(paste("Cached to:", cache_file))
}

load_from_cache <- function(cache_key, cache_type = "demographics", max_age_days = 60) {
  cache_file <- file.path(cache_dir, paste0(cache_type, "_", cache_key, ".rds"))
  
  if (file.exists(cache_file)) {
    file_age <- difftime(Sys.time(), file.info(cache_file)$mtime, units = "days")
    
    if (as.numeric(file_age) <= max_age_days) {
      message(paste("Loading from cache:", cache_file))
      return(readRDS(cache_file))
    } else {
      message(paste("Cache expired (", round(file_age, 1), "days old)"))
    }
  }
  return(NULL)
}

clear_cache <- function(cache_type = NULL) {
  if (is.null(cache_type)) {
    files <- list.files(cache_dir, full.names = TRUE)
  } else {
    files <- list.files(cache_dir, pattern = paste0("^", cache_type), full.names = TRUE)
  }
  
  if (length(files) > 0) {
    file.remove(files)
    message(paste("Removed", length(files), "cached files"))
  } else {
    message("No cached files to remove")
  }
}

Geocoding Function

Geocodes properties in the base .xlsx file if file does not already have longitude and latitude coordinates.

Show Code Block

# ============================================
# GEOCODING FUNCTIONS AND EXECUTION
# ============================================

# --- 1. Helper functions -----------------------------------------------------

geocode_address <- function(address) {
  base_url <- "https://geocoding.geo.census.gov/geocoder/locations/onelineaddress"

  response <- httr::GET(
    base_url,
    query = list(
      address = address,
      benchmark = "Public_AR_Current",
      format = "json"
    )
  )

  data <- httr::content(response, as = "parsed")

  if (length(data$result$addressMatches) > 0) {
    coords <- data$result$addressMatches[[1]]$coordinates
    return(c(lat = coords$y, lon = coords$x))
  } else {
    warning(paste("Address not found:", address))
    return(c(lat = NA, lon = NA))
  }
}

reverse_geocode <- function(lat, lon) {
  base_url <- "https://geocoding.geo.census.gov/geocoder/geographies/coordinates"

  response <- httr::GET(
    base_url,
    query = list(
      x = lon,
      y = lat,
      benchmark = "Public_AR_Current",
      vintage = "Current_Current",
      format = "json"
    )
  )

  data <- httr::content(response, as = "parsed")
  return(data$result)
}

# --- 2. Read the Excel file dynamically -------------------------------------

input_data <- read_excel(params$data_file) %>%
  janitor::clean_names()

# --- 3. Detect whether lat/lon already exist --------------------------------

# Normalize column names for flexibility
lat_col <- intersect(names(input_data), c("lat", "latitude"))
lon_col <- intersect(names(input_data), c("lon", "longitude"))

if (length(lat_col) == 1 && length(lon_col) == 1 &&
    all(!is.na(input_data[[lat_col]])) &&
    all(!is.na(input_data[[lon_col]]))) {

  message("Latitude and longitude found in dataset — skipping geocoding.")
  geocoded_results <- input_data %>%
    rename(lat = all_of(lat_col), lon = all_of(lon_col))

} else {
  message("Latitude and longitude not found — performing geocoding...")

  # --- 4. Prepare address strings for geocoding -----------------------------
  if (all(c("address", "city", "state") %in% names(input_data))) {
    input_data <- input_data %>%
      mutate(full_address = paste(address, city, state, sep = ", "))
  } else if ("address" %in% names(input_data)) {
    input_data <- input_data %>%
      mutate(full_address = address)
  } else {
    stop("Excel file must include at least an 'address' column, or 'address', 'city', and 'state'.")
  }

  # --- 5. Apply the geocoder safely -----------------------------------------
  geocode_safely <- purrr::safely(geocode_address)

  geocoded_results <- input_data %>%
    mutate(geo = map(full_address, geocode_safely)) %>%
    mutate(
      lat = map_dbl(geo, ~ .x$result["lat"] %||% NA_real_),
      lon = map_dbl(geo, ~ .x$result["lon"] %||% NA_real_)
    ) %>%
    select(-geo)
}

Table 2: Geocoded Property Results

Demographics Retrieval Function

Fetches demographic data from US Census API:

Variable	Description	Census Code
Population	Total population in census tract	B01003_001
Median Income	Median household income	B19013_001
Median Home Value	Median value of owner-occupied homes	B25077_001
Median Gross Rent	Median gross rent	B25046_001

Show Code Block

# ============================================
# DATA COLLECTION FUNCTIONS (Radius-Based, Safe Renaming, No Unemployment)
# ============================================

get_census_demographics <- function(lat, lon,
                                    census_api_key = NULL,
                                    radius_miles = trade_area_miles,
                                    use_cache = TRUE,
                                    cache_max_age = 30) {
  
 cache_key <- paste0(round(lat, 4), "_", round(lon, 4), "_r", radius_miles)

if (use_cache) {
  cached_data <- load_from_cache(cache_key, "demographics", cache_max_age)
  if (!is.null(cached_data)) {
    message("Loaded demographics from cache for ", cache_key)
    return(cached_data)
  }
}

message("Cache not found or expired — fetching new Census data...")


  if (!is.null(census_api_key)) {
    tidycensus::census_api_key(census_api_key, install = FALSE)
  }

  # Build buffer geometry
  point <- sf::st_sfc(sf::st_point(c(lon, lat)), crs = 4326)
  buffer_m <- radius_miles * 1609.34
  buffer_area <- sf::st_transform(point, 3857) |> 
    sf::st_buffer(buffer_m) |> 
    sf::st_transform(4326)

  # ACS variables of interest
  vars <- c(
    population          = "B01003_001",
    median_income       = "B19013_001",
    median_home_value   = "B25077_001",
    median_gross_rent   = "B25046_001"
  )

  tracts_data <- tigris::tracts(cb = TRUE, year = 2023, progress_bar = FALSE) |>
    sf::st_transform(4326)

  tracts_in_area <- suppressWarnings(sf::st_intersection(tracts_data, buffer_area))
  if (nrow(tracts_in_area) == 0) {
    warning("No census tracts found within trade area.")
    return(tibble())
  }

  tract_fips <- unique(paste0(tracts_in_area$STATEFP, tracts_in_area$COUNTYFP))

  message("Downloading ACS data...")
  demo_data_sf <- purrr::map_dfr(
    tract_fips,
    function(fips) {
      tryCatch({
        tidycensus::get_acs(
          geography = "tract",
          variables = vars,
          state = substr(fips, 1, 2),
          county = substr(fips, 3, 5),
          year = 2022,
          survey = "acs5",
          geometry = TRUE
        )
      }, error = function(e) NULL)
    }
  ) |> sf::st_transform(4326)

  if (nrow(demo_data_sf) == 0) {
    warning("No ACS data returned for this area.")
    return(tibble())
  }

  demo_in_area <- suppressWarnings(sf::st_intersection(demo_data_sf, buffer_area))
  if (nrow(demo_in_area) == 0) {
    warning("No demographic data found within buffer.")
    return(tibble())
  }

  # Aggregate results
  demo_summary <- demo_in_area |>
    dplyr::group_by(variable) |>
    dplyr::summarise(estimate = mean(estimate, na.rm = TRUE), .groups = "drop") |>
    tidyr::pivot_wider(names_from = variable, values_from = estimate)

  # --- Safe renaming: only rename if columns exist ---------------------------
  rename_map <- c(
    B01003_001 = "population",
    B19013_001 = "median_income",
    B25077_001 = "median_home_value",
    B25046_001 = "median_gross_rent"
  )

  rename_map <- rename_map[names(rename_map) %in% names(demo_summary)]
  demo_summary <- dplyr::rename(demo_summary, !!!rename_map)

  demo_summary <- demo_summary |>
    dplyr::mutate(radius_miles = radius_miles)

  if (use_cache) save_to_cache(demo_summary, cache_key, "demographics")

  return(demo_summary)
}

# Memoised version for fast in-session reuse
get_census_demographics_memo <- memoise(get_census_demographics)

Amenity Function

Queries OpenStreetMap (https://wiki.openstreetmap.org/wiki/Key:amenity) for nearby amenities within specified radius (1.5 miles for this analysis). OSM’s amenity key is extremely broad. It includes everything from airports to post boxes.

In the context of retail demand and site selection, the relevant amenities are those that:

Attract regular consumer visits (e.g., restaurants, cafes, banks, pharmacies).
Represent daily or weekly necessities (e.g., supermarkets, convenience stores, schools).
Encourage dwell time or foot traffic clustering (e.g., parks, places of worship, leisure).
Support workforce presence (e.g., offices, post offices, government buildings).

These amenities serve as proxies for consumer presence, disposable income circulation, and accessibility to routine goods and services. For a retail valuation context, they also mirror the “market support” and “trade area vitality” concepts used in market analysis texts like Fanning, Market Analysis for Real Estate, 2nd ed.

The general amenity categories that are used in this analysis include the following:

Supermarkets - As a proxy for measuring the presence of food retail and daily household shopping activity.
Restaurants - Correlates to consumer engagement, social clustering and evening/weekend traffic.
Cafes - Also correlates to consumer engagement, social clustering and evening/weekend traffic.
Banks - Usually daytime traffic and draw to other nearby retail establishments.
Hospitals - Represents essential service and consistent daytime population.
Pharmacies - Regular essential service that often reflects high traffic.
School/College/University - Brings regular population flows including students, parents, and staff.
Parks - Encourages recreational foot traffic and neighborhood appeal.

Show Code Block

# ============================================
# IMPROVED AMENITY COLLECTION FUNCTION
# ============================================

find_nearby_amenities <- function(lat, lon,
                                  amenity_types = params$amenity_types,
                                  radius_miles = trade_area_miles,
                                  use_cache = TRUE,
                                  cache_max_age = 7) {

  # Fallback list
  if (is.null(amenity_types) || length(amenity_types) == 0) {
    amenity_types <- c("school", "college", "university", "restaurant", "cafe",
                       "bank", "hospital", "pharmacy", "supermarket", "park")
  }

  cache_key <- paste0(round(lat, 4), "_", round(lon, 4), "_r", radius_miles)
  if (use_cache) {
    cached_data <- load_from_cache(cache_key, "amenities", cache_max_age)
    if (!is.null(cached_data)) return(cached_data)
  }

  radius_m <- radius_miles * 1609.34
  bbox_buffer <- radius_m / 111320  # convert meters to degrees (approx)
  bbox <- c(lon - bbox_buffer, lat - bbox_buffer, lon + bbox_buffer, lat + bbox_buffer)

  results <- tibble(amenity = character(), count = numeric())
  target <- sf::st_sfc(sf::st_point(c(lon, lat)), crs = 4326) |> sf::st_transform(3857)

  for (a in amenity_types) {
    tryCatch({
      # Choose appropriate key depending on amenity
      key <- dplyr::case_when(
        a == "supermarket" ~ "shop",
        a == "park" ~ "leisure",
        TRUE ~ "amenity"
      )

      query <- osmdata::opq(bbox = bbox) %>%
        osmdata::add_osm_feature(key = key, value = a)

      osm_data <- osmdata::osmdata_sf(query)
      count <- 0

      # Combine points and polygons
      all_geoms <- list(osm_data$osm_points, osm_data$osm_polygons)
      all_geoms <- all_geoms[!vapply(all_geoms, is.null, logical(1))]

      if (length(all_geoms) > 0) {
        all_sf <- dplyr::bind_rows(lapply(all_geoms, function(g) {
          sf::st_centroid(sf::st_transform(g, 3857))
        }))
        dists <- sf::st_distance(all_sf, target)
        count <- sum(as.numeric(dists) <= radius_m)
      }

      results <- dplyr::add_row(results, amenity = a, count = count)
      Sys.sleep(1)

    }, error = function(e) {
      warning(paste("Error fetching", a, ":", e$message))
      results <<- dplyr::add_row(results, amenity = a, count = 0)
    })
  }

  if (use_cache) save_to_cache(results, cache_key, "amenities")
  return(results)
}

# Memoised version
find_nearby_amenities_memo <- memoise::memoise(find_nearby_amenities)

Table 3: Amenity Counts by Property

Sale	Banks	Cafes	Colleges	Hospitals	Parks	Pharmacies	Restaurants	Schools	Supermarkets	Universities	Total
1	53	2	0	0	427	3	66	48	52	0	651
2	26	9	0	0	29	0	18	16	20	0	118
3	84	1	0	20	63	2	193	37	46	1	447
4	3	15	5	14	283	4	233	36	33	0	626
5	28	1	0	0	146	1	50	190	10	0	426
6	114	2	0	6	290	9	71	88	50	0	630
7	30	0	0	0	9	0	37	10	0	0	86
8	61	4	0	15	216	2	107	34	90	0	529
Subject	0	0	0	0	467	18	6	66	19	45	621

Traffic Count Information

A user would need to add the traffic counts to their base Excel file.

Traffic Count Data Sources

Traffic count data measure the average number of vehicles passing specific roadway points, often expressed as Annual Average Daily Traffic (AADT). While this report uses data from the Texas Department of Transportation (TxDOT) Open Data Portal, similar information is available from most U.S. states and many other countries.

United States

State Departments of Transportation (DOTs):
Most state DOTs publish AADT datasets through their GIS or open data portals.
Examples include:
- Caltrans (California) → Traffic Volumes
- Florida Department of Transportation (FDOT) → Traffic Counts
- North Carolina DOT (NCDOT) → Traffic Survey Data
Federal Highway Administration (FHWA):
The FHWA provides nationwide roadway and traffic data through the
Highway Performance Monitoring System (HPMS).

Canada

Provincial Ministries of Transportation:
Each province maintains similar datasets:
- Ontario Open Data Catalogue → Annual Average Daily Traffic
- DataBC Catalogue (British Columbia) → Traffic Counts
- GeoDiscover Alberta → Transportation / Traffic Volume Sites
Municipal and Regional Sources:
Large municipalities (e.g., Toronto, Vancouver, Calgary) often publish local traffic counts through their open data portals.

International

United Kingdom:
Department for Transport Open Data — includes continuous and manual count sites across England, Scotland, and Wales.
European Union:
Many member nations release traffic and transport statistics under
the INSPIRE Directive or through Eurostat Transport Statistics.
Global Alternatives:
In regions without government data, commercial and open platforms such as
TomTom Traffic,
HERE Traffic, or
OpenStreetMap Traffic Layers
can provide approximate roadway congestion or flow estimates.

Future Implementation Notes

To integrate local traffic data into this analysis: 1. Download the shapefile or GeoJSON version of your region’s AADT or traffic count dataset.
2. Load it into R using sf::st_read("path/to/file.shp").
3. Perform a spatial join between your property points and the nearest road segment (sf::st_join() or st_nearest_feature()).
4. Extract or average the relevant traffic count field for scoring.

Weighting Function/Inputs

Required inputs to apply appropriate weight to the income (consumer spending power), the population density (a reflection of foot traffic potential), amenities (neighborhood quality and drawing power), property values (a function of costs to live in area), and traffic counts (a measure of visibility, accessibility and potential customers beyond population in trade area).

Scoring Function

Calculates a comprehensive location score (0-100) based on five factors:

Formula: rescale(median_income, to = c(0, 100), from = range(all_incomes))
Higher income = better location (greater purchasing power)
Dyanmically normalized to 0-100 scale

Interpretation: The Income Score measures the relative purchasing power of households within each trade area. Median household income remains a reliable proxy for local spending capacity and the ability of residents to support a broad range of retail goods and services. Higher median incomes typically correspond with greater discretionary spending, stronger support for mid- and upper-tier retailers, and enhanced resilience during economic downturns.

This uses adaptive scaling to account for any range of income levels. The score is calculated by rescaling each trade area’s median income within the range of all incomes observed across the study sample. This approach preserves meaningful differences among higher-income areas, where a capped formula would otherwise treat them as identical, and allows for balanced comparison between affluent and moderate-income markets.

Notes on the Scale: The scaling is anchored to the lowest and highest median incomes within the comparison group. The location with the highest income receives a score of 100, while lowest income area scores a 0. Other income levels are scored proportionally.

Formula: rescale(population, to = c(0, 100), from = range(all_populations))
Higher population = more potential customers/activity
Normalized to 0-100 scale

Interpretation: The Population Density Score represents the concentration of potential customers within the trade area. Retail activity depends not only on affluence but also on proximity and repetition of visits. Areas with greater population density tend to generate stronger pass-by traffic, higher visibility, and more consistent sales volumes.

This metric applies adaptive scaling rather than a fixed divisor. Each trade area’s population is rescaled within the observed range of populations across the study sample. This ensures that every location receives a proportionate score, maintaining meaningful separation even among densely populated areas where a capped formula might otherwise flatten results.

Notes on the Scale: The lowest population in the comparison group anchors the bottom of the scale at 0, while the highest defines 100. All other sites are positioned proportionally between them.

Formula: 0.4 × (Diversity %) + 0.6 × (Density %, adaptively scaled 0–100)
More diverse and concentrated amenities = stronger location
Balances variety and overall intensity of surrounding uses

Interpretation: The Amenities Score reflects both the diversity and density of nearby amenities that contribute to a site’s retail vitality. Diversity measures the proportion of tracked amenity categories—such as restaurants, schools, parks, banks, and pharmacies—that are represented within the trade area. Density measures the total number of amenities, scaled relative to the range observed across all study locations.

Together, these components create a balanced indicator of neighborhood convenience, consumer draw, and the underlying strength of the retail ecosystem. Locations that contain all tracked amenity types receive full credit for diversity, while those with fewer categories present earn proportionally less. The density component differentiates sites based on how intensively built and commercially active they are within the same amenity mix.

Notes on the Scale: The diversity index reaches 100 when all tracked amenity categories are represented within the trade area. The density index is adaptively normalized between the lowest and highest total amenity counts observed among all properties being compared, ensuring that each site’s score reflects its position within the actual range of market activity. The final score combines both effects, assigning 40% weight to diversity and 60% to density. This method ensures fair comparison across urban, suburban, and rural contexts without relying on arbitrary saturation thresholds.

Formula: 0.50 x min(median_home_value / 5000, 100) + 0.5 x min(median_gross_rent / 30, 100)
Higher values = more desirable neighborhood
Normalized to 0-100 scale

Interpretation: The Property Value Score represents a composite measure of market vitality that combines indicators of long-term stability and short-term spending capacity within a trade area. Median home value reflects neighborhood affluence, ownership stability, and the general desirability of the surrounding residential market. Median gross rent complements this by capturing the relative cost of occupancy and the local balance between household income and housing expense—an important proxy for retail purchasing power and economic elasticity.

By weighting home value and rent equally (50/50), the score avoids bias toward either high-income owner-occupied areas or transient high-rent districts. A higher Property Value Score therefore signals a trade area that is both financially resilient and economically active, typically corresponding to locations that can support sustainable retail rents and consistent consumer demand.

High home value and high rent → strong, affluent market with high retail viability.
High rent but low home value → possibly transient or overburdened market.
High home value but low rent → stable ownership but low circulation spending.

Notes on the Scale” The divisor for home value (/5000) is a normalization factor that caps high-value markets near 100. You can tune it—if your markets are typically high-end, you might raise that to /6000 or /8000. The divisor for gross rent (/30) is similar: $30 × 100 = $3,000, roughly the upper bound for typical monthly rent, producing comparable scaling.

Formula: rescale(traffic_count, to = c(0, 100), from = range(all_traffic_counts))
Higher traffic = greater retail visibility and consumer flow
Dynamically normalized to 0–100 scale

Interpretation: The Traffic Count Score measures the retail exposure and accessibility of a location based on roadway volume. Higher traffic counts typically translate to greater visibility, impulse visits, and daily customer flow.

This score is dynamically scaled relative to the comparison group, ensuring that properties with exceptionally high or low counts remain distinguishable within each analysis set.

Notes on the Scale: The lowest observed traffic count anchors the scale at 0, and the highest at 100. When comparing sites across multiple markets, a shared range can be applied to maintain consistent interpretation across studies.

Total Score Calculation \[ \text{Total Score} = \sum_{i=1}^{5} (\text{Component Score}_i \times \text{Weight}_i) \]

Interpretation: The Total Score represents the weighted sum of five core indicators—Income, Population, Amenities, Property Value, and Traffic Count—each scaled from 0 to 100. Weights (𝑤ᵢ) control the relative importance of each factor and are defined in the document parameters. This additive framework allows the model to reflect both market potential (income, population), site quality (amenities, property value), and exposure/accessibility (traffic count).

Notes on the Scale: Each component score is dynamically normalized within its observed range to preserve proportional differences among locations. The composite score therefore expresses each site’s overall retail viability relative to all others in the analysis.

Analysis Function

Main function that ties everything together:

Geocodes address (if not provided in Excel file)
Fetches demographics
Queries amenities
Calculates scores
Returns comprehensive results object

Show Code Block

# ============================================
# BATCH COMPARISON — FULLY NORMALIZED
# ============================================

compare_locations_from_excel <- function(
  data_file = params$data_file,
  census_api_key = Sys.getenv("CENSUS_API_KEY"),
  center_type = params$shopping_center_type,
  radius_miles = trade_area_miles
) {
  # Always pull weights from YAML
  weights <- unlist(params$weights$value)

  
  message("-----------------------------------------------------")
  message("Running Batch Location Comparison")
  message("Center Type: ", center_type)
  message("Trade Area Radius: ", radius_miles, " miles")
  message("Source File: ", data_file)
  message("-----------------------------------------------------")
  
  # --- Load data ---
  df <- readxl::read_excel(data_file) |>
    janitor::clean_names()
  
  if (!"traffic_count" %in% names(df)) {
    df$traffic_count <- NA_real_
  }
  
  lat_col <- intersect(names(df), c("lat", "latitude"))
  lon_col <- intersect(names(df), c("lon", "longitude"))
  
  if (length(lat_col) != 1 || length(lon_col) != 1) {
    stop("Excel file must include latitude and longitude columns.")
  }
  
  df <- df |>
    dplyr::rename(lat = all_of(lat_col), lon = all_of(lon_col)) |>
    dplyr::filter(!is.na(lat) & !is.na(lon))
  
  # --- Compute traffic range globally ---
  traffic_range <- range(df$traffic_count, na.rm = TRUE)
  
  # --- Gather demographic + amenity data for global scaling ---
  demo_list <- list()
  amenity_list <- list()
  
  for (i in seq_len(nrow(df))) {
    lat <- df$lat[i]
    lon <- df$lon[i]
    
    demo <- tryCatch(
      get_census_demographics(lat, lon, census_api_key, radius_miles = radius_miles),
      error = function(e) tibble()
    )
    if (nrow(demo) > 0) demo_list[[i]] <- demo
    
    amen <- tryCatch(
      find_nearby_amenities(lat, lon, radius_miles = radius_miles),
      error = function(e) tibble()
    )
    if (nrow(amen) > 0) amenity_list[[i]] <- amen
  }
  
  all_demo <- bind_rows(demo_list)
  income_range <- range(all_demo$median_income, na.rm = TRUE)
  population_range <- range(all_demo$population, na.rm = TRUE)
  home_value_range <- range(all_demo$median_home_value, na.rm = TRUE)
  rent_range <- range(all_demo$median_gross_rent, na.rm = TRUE)
  
  amenity_summary <- purrr::map_dfr(amenity_list, function(a) {
    tibble(diversity = length(unique(a$amenity)),
           density = sum(a$count, na.rm = TRUE))
  })
  div_range <- range(amenity_summary$diversity, na.rm = TRUE)
  den_range <- range(amenity_summary$density, na.rm = TRUE)
  
  # --- Run analysis for each property ---
  results <- purrr::pmap_dfr(
    df,
    function(...) {
      row <- tibble::tibble(...)
      message("Analyzing location: ", row$lat, ", ", row$lon)
      
      analysis <- calculate_location_score(
        lat = row$lat,
        lon = row$lon,
        traffic_count = row$traffic_count,
        census_api_key = census_api_key,
        radius_miles = radius_miles,
        income_range = income_range,
        population_range = population_range,
        traffic_range = traffic_range,
        home_value_range = home_value_range,
        rent_range = rent_range,
        amenity_div_range = div_range,
        amenity_den_range = den_range
      )
      
      tibble::tibble(
  property_id = if ("id" %in% names(row)) row$id else NA_character_,
  address = if ("address" %in% names(row)) row$address else paste0("(", round(row$lat, 4), ", ", round(row$lon, 4), ")"),
  lat = row$lat,
  lon = row$lon,
  total_score = analysis$total_score,
  income_score = analysis$individual_scores$income,
  population_score = analysis$individual_scores$population_density,
  amenities_score = analysis$individual_scores$amenities,
  property_value_score = analysis$individual_scores$property_value,
  traffic_score = analysis$individual_scores$traffic_count
)

    }
  )
  
  # --- Rank and export ---
  ranked_results <- results |>
    dplyr::arrange(desc(total_score)) |>
    dplyr::mutate(rank = dplyr::row_number())
  
  output_file <- paste0("Location_Scoring_", gsub(" ", "_", center_type), "_", radius_miles, "mi.xlsx")
  writexl::write_xlsx(ranked_results, output_file)
  
  message("Results saved to: ", output_file)
  message("-----------------------------------------------------")
  
  return(ranked_results)
}

Table 4: Ranked Retail Locations

Ranked Retail Locations
Rank	Sale	Total Score	Income Score	Population Score	Amenities Score	Property Value Score	Traffic Score
1	1	74.3	25.8	100.0	100.0	65.2	100.0
2	Subject	67.5	100.0	61.5	96.8	73.6	10.0
3	7	54.5	56.3	57.2	40.0	91.1	36.7
4	2	46.0	25.8	99.8	43.4	39.3	0.0
5	6	42.0	34.6	43.6	97.8	56.2	15.5
6	8	41.3	27.2	51.6	87.0	43.1	23.2
7	4	32.0	35.5	0.0	97.3	59.6	28.5
8	5	28.8	0.0	9.0	76.1	14.7	85.1
9	3	21.6	12.6	19.8	78.3	13.5	13.5

Create Combined Results Export

This will create a new Excel file with all collected inforamtion.

Show Code Block

# =====================================================
# EXPORT COMPREHENSIVE RESULTS (Failsafe Version)
# =====================================================

safely_get_col <- function(df, patterns) {
  cols <- names(df)
  match <- cols[grepl(paste(patterns, collapse = "|"), cols, ignore.case = TRUE)]
  if (length(match) > 0) return(match[1])
  return(NULL)
}

# --- 1. Read and clean the base Excel file ---
original_data <- readxl::read_excel(params$data_file) %>%
  janitor::clean_names()

# --- 2. Detect or create address column ---
address_col <- safely_get_col(original_data, c("^address$", "addr", "site", "property"))
if (!is.null(address_col)) {
  original_data <- original_data %>%
    rename(address = !!sym(address_col))
  message("Detected address column: ", address_col)
} else {
  message("⚠️  No address column detected. Creating synthetic labels.")
  original_data <- original_data %>%
    mutate(address = paste0("Site_", row_number()))
}

# --- 3. Ensure lat/lon columns exist ---
lat_col <- safely_get_col(original_data, c("^lat$", "latitude"))
lon_col <- safely_get_col(original_data, c("^lon$", "longitude"))
if (!is.null(lat_col)) original_data <- rename(original_data, lat = !!sym(lat_col)) else original_data$lat <- NA_real_
if (!is.null(lon_col)) original_data <- rename(original_data, lon = !!sym(lon_col)) else original_data$lon <- NA_real_

# --- 4. Add geocoded coordinates if missing ---
if (exists("geocoded_results")) {
  original_data <- original_data %>%
    mutate(addr_key = str_trim(tolower(address))) %>%
    left_join(
      geocoded_results %>%
        mutate(addr_key = str_trim(tolower(address))) %>%
        select(addr_key, lat_geo = lat, lon_geo = lon),
      by = "addr_key"
    ) %>%
    mutate(
      lat = coalesce(lat, lat_geo),
      lon = coalesce(lon, lon_geo)
    ) %>%
    select(-addr_key, -lat_geo, -lon_geo)
}

# --- 5. Build amenity summary safely ---
if (exists("amenity_results")) {
  amenity_summary_join <- amenity_results %>%
    mutate(addr_key = str_trim(tolower(address))) %>%
    group_by(addr_key, amenity) %>%
    summarise(total = sum(count, na.rm = TRUE), .groups = "drop") %>%
    tidyr::pivot_wider(
      names_from = amenity,
      values_from = total,
      values_fill = list(total = 0)
    )
} else {
  amenity_summary_join <- tibble(addr_key = character())
}

# --- 6. Gather raw demographics ---
demo_list <- purrr::map2_dfr(
  original_data$lat, original_data$lon,
  function(lat, lon) {
    if (is.na(lat) || is.na(lon)) return(tibble())
    d <- get_census_demographics(lat, lon, census_api_key = Sys.getenv("CENSUS_API_KEY"))
    if (nrow(d) == 0) return(tibble())
    d$lat <- lat; d$lon <- lon
    d
  }
)

demo_join <- demo_list %>%
  mutate(lat_r = round(lat, 4), lon_r = round(lon, 4)) %>%
  select(lat_r, lon_r, population, median_income, median_home_value, median_gross_rent)

# --- 7. Merge everything together ---
final_export <- location_scores %>%
  mutate(addr_key = str_trim(tolower(address))) %>%
  left_join(amenity_summary_join, by = "addr_key") %>%
  mutate(lat_r = round(lat, 4), lon_r = round(lon, 4)) %>%
  left_join(demo_join, by = c("lat_r", "lon_r")) %>%
  left_join(
    original_data %>%
      mutate(addr_key = str_trim(tolower(address))) %>%
      select(-lat, -lon),
    by = "addr_key"
  )

# --- 8. Ensure essential columns exist ---
for (col in c("address", "lat", "lon", "population", "median_income",
              "median_home_value", "median_gross_rent")) {
  if (!col %in% names(final_export)) final_export[[col]] <- NA
}

# --- 9. Reorder logical columns ---
final_export <- final_export %>%
  select(
    address,
    lat, lon,
    population, median_income, median_home_value, median_gross_rent,
    total_score, income_score, population_score,
    amenities_score, property_value_score, traffic_score,
    everything()
  )

# --- 10. Write to Excel ---
output_file <- paste0(
  "Comprehensive_Location_Results_",
  gsub(" ", "_", params$shopping_center_type),
  "_", trade_area_miles, "mi.xlsx"
)

writexl::write_xlsx(final_export, output_file)

Sensivity Analysis

The sensitivity analysis tests how stable each property’s ranking remains when the weighting of input factors such as income, population density, amenities, property values, and traffic counts is allowed to vary randomly within reasonable limits. This scenario analysis runs 5,000 times with alternative weighting configuration being simulated. Each configuration produces a new set of location rankings. The resulting distribution of simulated ranks for each property reflects how dependent its position is on the specific weighting assumptions used in the model.

The median simulated rank represents the property’s typical performance across all simulations, while the 5th and 95th percentiles mark its best-case and worst-case outcomes, respectively. A property with a low median rank and a narrow percentile range demonstrates consistent strength and resilience to changing assumptions. In contrast, a wide percentile range indicates that the property’s relative standing shifts considerably under different weighting scenarios, suggesting greater sensitivity to how the model prioritizes its components. Together, these measures reveal not only which locations perform best on average but also which are most robust to uncertainty in the underlying valuation logic.

About the Sensitivity Simulation

The “reasonable limits” for weighting variation are defined by a distribution, which produces random, positive weights that sum to one. This distribution allows each factor including Income, Population, Amenities, Property Value, and Traffic to fluctuate broadly between near-zero and moderate dominance, representing plausible shifts in judgment without introducing unrealistic extremes.Each factor can vary between very small (near 0) and fairly large (up to ~0.7–0.8), but the average weight remains around 0.20 (1/5). The Gamma(1,1) shape parameter ensures broad variability without producing absurd or hyper-concentrated weights (like one factor getting 99% of total influence).

Each Monte Carlo run represents one hypothetical weighting configuration within those bounds. Monte Carlo error decreases approximately in proportion to the inverse of the square root of the number of simulations:

\[ \text{Error} \propto \frac{1}{\sqrt{N}} \]

This means that as the number of scenarios (N) increases, the uncertainty in the simulated results shrinks rapidly at first and then levels off:

100 runs → approximately 10 % uncertainty
1,000 runs → approximately 3 % uncertainty
5,000 runs → approximately 1.4 % uncertainty

In practice, 5,000 simulations represent a balanced choice—large enough to yield stable percentile and rank estimates, yet small enough to run efficiently on a standard laptop without noticeable slowdown.

(a) Dot = Median Simulated Rank, Line = 5th–95th Percentile

Table 5: Simulated Ranking Stability Summary

Sale	Median Rank	5th Percentile	95th Percentile
1	1.0	1.0	1.5
Subject	2.0	1.4	4.0
7	3.0	2.5	5.1
8	5.0	5.0	6.5
6	5.5	4.0	6.5
2	6.0	2.5	9.0
4	7.0	3.5	7.5
5	8.0	2.5	9.0
3	9.0	7.4	9.0

Ranked Location Map

Figure 2

Figure 2: Ranked Location Map by Total Score.

Radar Chart

Figure 3: Component Comparisons Radar Chart

Weighted Sensivity Heatmap

Figure 4: Weighted Factor Sensitivity Heatmap

Amenity Density Map

Still needs work

Final Summary Table

Comparative analysis is defined as “the process by which a value indication is derived in the sales comparison approach. Comparative analysis may employ quantitative or qualitative techniques, either separately or in combination.”¹ The Appraisal of Real Estate outlines examples of the techniques used in quantitative adjustments and qualitative analyses as shown in Table 6.²

Table 6: Techniques Used in Quantitative and Qualitative Analysis

Quantitative Analysis	Qualitative Analysis
Paired data analysis (sales and resales of the same or similar properties)	Relative comparison analysis
Grouped data analysis	Ranking analysis
Secondary data analysis	Personal interviews
Statistical analysis including graphic analysis and scenario analysis
Cost-related adjustments (cost to cure, depreciated cost)
Capitalization of income differences
Trend analysis

When adequate data is available, quantitative analysis can be a helpful tool, but often the differences between properties makes it challenging to determine appropriate adjustments with a level of certainty. In these cases qualitative analysis or ranking can be helpful.”³ Qualitative analysis recognizes the inefficiencies of real estate markets and the difficulty of expressing adjustments with mathematical precision.”

The analysis to this point has described in detail how the properties analysed for location attributes contribute to retail property success. As shown in Table 7, the subject property is near the top of the ranking analysis suggesting a value at or below Sale 1 and at or above the remaining sales.

Table 7: Executive Summary — Retail Location Comparison Summary

Rank	Sale	Total Score	Price/SF
1	1	74.3	$451
2	Subject	67.5
3	7	54.5	$308
4	2	46.0	$428
5	6	42.0	$392
6	8	41.3	$281
7	4	32.0	$391
8	5	28.8	$240
9	3	21.6	$115

Footnotes

The Dictionary of Real Estate Appraisal, 7th ed. (Chicago: Appraisal Institute, 2022), s.v. “comparative analysis.”↩︎
The Appraisal of Real Estate, 15th ed. (Chicago: Appraisal Institute, 2020), 362.↩︎
Appraisal of Real Estate, 15th ed. , 376.↩︎

--- title: "Retail Location Analysis" author: "Jim Amorin, CAE, MAI, SRA, AI-GRS, CDEI" date: today date-format: "MMMM D, YYYY" prefer-html: true params: data_file: "Austin.xlsx" shopping_center_type: "Neighborhood Shopping Center" # Valid types = "Convenience Store", "Neighborhood Shopping Center", "Community Shopping Center", "Regional Shopping Center", and "Super-Regional Shopping Center" amenity_types: - school - college - university - restaurant - cafe - bank - pharmacy - supermarket - hospital - park weights: value: income: 0.30 population_density: 0.30 amenities: 0.10 property_value: 0.10 traffic_count: 0.20 format: html: toc: true toc-depth: 3 code-fold: true code-tools: true code-summary: "Show Code Block" theme: cosmo highlight-style: github pdf: toc: true toc-depth: 3 number-sections: true highlight-style: github documentclass: article geometry: margin=1in mainfont: Calibri pdf-engine: xelatex header-includes: - \usepackage{adjustbox} - \usepackage{tabularx} - \usepackage{booktabs} - \usepackage{array} execute: # echo: false warning: false message: false --- ```{r} #| include: false options( kableExtra.latex.load_packages = FALSE, kableExtra.latex.table_env = "tabular", kableExtra.latex.longtable = FALSE, # ✅ completely disables longtable kableExtra.latex.bold_header = FALSE, knitr.kable.NA = "" ) ``` # Introduction This guide provides a comprehensive walk through of the **Property Location Analysis Tool**, an R-based system for evaluating real estate locations using geospatial data, demographics, and amenity analysis. ## Location Adjustments in Retail Property Appraisal In the valuation of retail properties, location remains one of the most influential determinants of value. Location adjustments account for the qualitative and quantitative differences in the site characteristics that influence a property's utility, visibility, and access to a viable customer base. These adjustments are particularly critical when analyzing comparable sales in varying retail environments. Retail success is closely tied to the property's trade area or that geographic region from which a retail establishment draws the majority of its customers. The size and shape of a trade area are rarely uniform and are influenced by multiple factors including existing and proposed residential developments, natural or manmade barriers (e.g., rivers, highways), and psychological boundaries such as perceptions of safety. Transportation infrastructure and commuter patterns further shape trade area dynamics by affecting ease of access for potential shoppers. Demographic variables within a trade area such as population density, average household income, age distribution, and consumer spending patterns directly impact retail viability. For instance, household income levels and consumer expenditure habits (e.g., on groceries, apparel, or electronics) can indicate the buying power and preferences of the local market. Retailers often rely on these data points to determine store format, product mix, and even pricing strategies. Retail property buyers and appraisers must also evaluate the visibility and accessibility of a site. Elements such as traffic volume, street frontage, line-of-sight from major roads, and access to public transportation all contribute to a location’s competitive advantage. A site that is difficult to locate or reach whether due to excessive traffic speeds, poor signage, or other accessibility issues can significantly underperform in comparison to more accessible alternatives. Ultimately, effective location adjustments require an understanding not only of physical proximity but of the socio-economic context within which a retail property operates. By carefully analyzing trade area boundaries, demographic trends, and site-specific factors, appraisers can make more precise adjustments, thereby producing credible and supportable value conclusions. @tbl-shopping-centers below outlines some of the key characteristics of various shopping center types. ```{r} #| label: tbl-shopping-centers #| message: false #| warning: false #| echo: false library(knitr) library(kableExtra) shopping_centers <- data.frame( Type = c( "Convenience Center", "Neighborhood Shopping Center", "Community Shopping Center", "Regional Shopping Center", "Super-Regional Shopping Center" ), Tenantry = c( "Stores that sell convenience goods (e.g., groceries, pharmaceuticals); not anchored by a supermarket.", "Stores that sell convenience goods and personal services (e.g., dry cleaning, shoe repair); often anchored by a supermarket.", "Stores that sell convenience goods, personal services, and shoppers’ goods (e.g., apparel, appliances); may include a junior department store or off-price/discount store.", "Stores that sell general merchandise, shoppers’ goods, and convenience goods; typically includes one or more department stores.", "Stores that sell general merchandise, apparel, furniture, home furnishings, services, and recreation; contains at least three major department stores." ), Size = c( "Less than 30,000 sq ft", "30,000 – 150,000 sq ft of gross leasable area; 4 – 10 acres", "100,000 – 300,000 sq ft of gross leasable area; 10 – 30 acres", "300,000 – 1,000,000 sq ft of gross leasable area; 30 – 100 acres", "Over 800,000 sq ft of gross leasable area" ), `Primary Trade Area` = c( "Less than 5-minute driving time", "Less than 5-minute driving time; 1 – 1½-mile range; 5,000 – 40,000 potential customers", "5 – 20-minute driving time; 3 – 6-mile range; 40,000 – 150,000 potential customers", "20 – 40-minute driving time; 5 – 10-mile range; 150,000 – 400,000 potential customers", "In excess of 30-minute driving time; typically 10 – 35-mile range; over 500,000 potential customers" ), stringsAsFactors = FALSE ) # --- HTML version ----------------------------------------------------------- if (knitr::is_html_output()) { tbl <- kable( shopping_centers, caption = "Type of Shopping Centers", col.names = c("Type", "Tenantry", "Size", "Primary Trade Area"), booktabs = TRUE ) |> kable_styling(full_width = TRUE, bootstrap_options = c("striped", "hover")) note_html <- '<div style="text-align:left; font-size:85%; margin-top:4px;"> Source: Stephen F. Fanning, <i>Market Analysis for Real Estate</i>, 2nd ed. (Chicago: Appraisal Institute, 2014) </div>' knitr::asis_output(paste0(tbl, "\n", note_html)) # --- PDF (LaTeX) version ---------------------------------------------------- } else { latex_tbl <- " \\begin{table}[htbp] \\centering \\caption{Type of Shopping Centers} \\small \\begin{tabular}{p{4cm} p{6cm} p{4cm} p{4cm}} \\toprule \\textbf{Type} & \\textbf{Tenantry} & \\textbf{Size} & \\textbf{Primary Trade Area} \\\\ \\midrule Convenience Center & Stores that sell convenience goods (e.g., groceries, pharmaceuticals); not anchored by a supermarket. & Less than 30,000 sq ft & Less than 5-minute driving time \\\\ Neighborhood Shopping Center & Stores that sell convenience goods and personal services (e.g., dry cleaning, shoe repair); often anchored by a supermarket. & 30,000 – 150,000 sq ft of gross leasable area; 4 – 10 acres & Less than 5-minute driving time; 1 – 1½-mile range; 5,000 – 40,000 potential customers \\\\ Community Shopping Center & Stores that sell convenience goods, personal services, and shoppers’ goods (e.g., apparel, appliances); may include a junior department store or off-price/discount store. & 100,000 – 300,000 sq ft of gross leasable area; 10 – 30 acres & 5 – 20-minute driving time; 3 – 6-mile range; 40,000 – 150,000 potential customers \\\\ Regional Shopping Center & Stores that sell general merchandise, shoppers’ goods, and convenience goods; typically includes one or more department stores. & 300,000 – 1,000,000 sq ft of gross leasable area; 30 – 100 acres & 20 – 40-minute driving time; 5 – 10-mile range; 150,000 – 400,000 potential customers \\\\ Super-Regional Shopping Center & Stores that sell general merchandise, apparel, furniture, home furnishings, services, and recreation; contains at least three major department stores. & Over 800,000 sq ft of gross leasable area & In excess of 30-minute driving time; typically 10 – 35-mile range; over 500,000 potential customers \\\\ \\bottomrule \\end{tabular} \\vspace{2mm} \\begin{flushleft} \\footnotesize\\textit{Source: Stephen F. Fanning, Market Analysis for Real Estate, 2nd ed. (Chicago: Appraisal Institute, 2014)} \\end{flushleft} \\end{table} " knitr::asis_output(latex_tbl) } ``` ## What This Tool Does The Property Location Analysis Tool helps you determine a ranking of the location of comparables on a qualitative basis using a scoring of key elements: - **Provides an Input Section** to set parameters for the property type, weightings, etc. - **Score locations** based on multiple factors (income, population, amenities, property values, and traffic counts) - **Compare properties** side-by-side to identify the best opportunities - **Visualize results** with interactive maps, charts, and tables - **Generate reports** in multiple formats (Excel, HTML, PNG) - **Caches data** to avoid repeated API calls and speed up analysis ## Key Features ::: callout-note ## Core Capabilities - Geocoding addresses to coordinates (and vice versa) - Fetching US Census demographic data by location - Querying OpenStreetMap for nearby amenities - Calculating weighted location scores - Creating interactive visualizations - Exporting comprehensive reports ::: ## Understanding the Code Structure ### Architecture Overview The tool is organized into several functional modules as shown in the following graphic. : ```{mermaid} graph TD A[Input: Property Details] --> B{Lat/Lon Available?} B -- Yes --> C[Use Existing Coordinates] B -- No --> D[Geocode Addresses] C --> E[Data Collection] D --> E E --> F[Census Demographics] E --> G[OSM Amenities] F --> H[Scoring Engine] G --> H H --> I[Analysis Results] I --> J[Visualizations] I --> K[Reports] J --> L[Maps, Charts, Tables] K --> M[Excel, HTML] ``` ## Usage Instructions ### Basic Usage 1. Prepare an Excel file with subject and sales' locations including these columns as a minimum: - `Sale`: Identifies the sale number or Subject Property - `Address`: Street address of the property - `City`: City property lies in - `State`: State location as two-letter abbreviation - `Zip Code`: Zip Code for property location - `Price/SF`: Price per square foot of property - `Traffic Count`: For now you need to enter this figure from any available source - Optional: `lat` or `latitude` column. If you have it enter it to avoid geocoding - Optional: `lon` or `longitude` column. If you have it enter it to avoid geocoding - Optional: other identifying columns 2. Set parameters at the top of this document: - `data_file`: Path to your Excel file - `shopping_center_type`: Type of retail center - `weights`: Set Custom Weights (check params for right label) 3. Render the document to generate analysis # Setup and Installation ```{r} #| include: false options(kableExtra.latex.load_packages = FALSE) # don't load tabu/longtable options(knitr.kable.NA = '') # optional: clean NA display # trying to so I can print to pdf ``` ## Load Libraries Install the appropriate libraries (install if necessary) ```{r} #| label: load-libraries library(sf) # Spatial data handling library(tidyverse) # Data manipulation and visualization library(httr) # HTTP requests library(jsonlite) # JSON parsing library(tigris) # Census geographic data library(tidycensus) # Census demographic data library(osmdata) # OpenStreetMap data library(leaflet) # Interactive maps library(plotly) # Interactive charts library(knitr) # Document generation library(kableExtra) # Table formatting library(writexl) # Excel export library(janitor) # Cleans and formats data frames library(readxl) # Read Excel files library(purrr) # Run multiple instances to increase speed library(tibble) # Create, view and manipulate data frames library(memoise) # Adds caching to avoid recomputing library(dplyr) # Filter, arrange, summarize, and transform data library(scales) # For transforming, formatting, and rescaling numeric data library(ggdist) # For Monte Carlo visualizations library(tmap) # For pdf maps library(tidyr) library(stringr) library(ggplot2) options(tigris_use_cache = TRUE) ``` ## Memoised Results This report uses two layers of caching to keep the analysis efficient and reproducible: 1. **File-Based Cache** The first layer stores previously downloaded data—such as Census demographics or OpenStreetMap amenities—inside the `property_analysis_cache/` folder on disk. - These files are reused on future runs, so the script doesn’t need to redownload identical data. - Cached data automatically expires after a set number of days (default: 60 for demographics, 7 for amenities). - You can manually clear this cache by running `clear_cache()` or deleting the folder. 2. **In-Memory Memoisation** The second layer (handled by the `memoise` package) remembers the results of function calls *within the current R session*. - If a function like `get_census_demographics()` is called again with the same coordinates, it retrieves the stored result from memory instantly rather than reading from disk. - This makes iterative runs during development much faster, especially when re-rendering the document multiple times. To ensure a clean start each time the Quarto document is rendered, the memoised (in-memory) layer is flushed at the beginning of each run. This reset does not delete the underlying `.rds` files in the file-based cache—it simply forgets any short-term memory from previous interactive sessions. ```{r} #| label: reset-memoised-cache #| echo: false if (exists("get_census_demographics_memo")) { memoise::forget(get_census_demographics_memo) } if (exists("find_nearby_amenities_memo")) { memoise::forget(find_nearby_amenities_memo) } ``` ## Parameter Validation Ensures selected parameters are valid as to shopping center type ```{r params-setup} valid_centers <- c( "Convenience Store", "Neighborhood Shopping Center", "Community Shopping Center", "Regional Shopping Center", "Super-Regional Shopping Center" ) if (!(params$shopping_center_type %in% valid_centers)) { stop("Invalid shopping center type. Please use one of the following: ", paste(valid_centers, collapse = ", ")) } ``` ## Census Key Users will need to get a free Census API key at: <https://api.census.gov/data/key_signup.html> Note: My census key has been saved as part of the .Renviron file so I don't have to enter it each time and it is safe from prying eyes. Can be edited through the use of `usethis::edit_r_environ()`. To retrieve the key, put this is a code block: `census_api_key <- Sys.getenv("CENSUS_API_KEY")`.Sets the Census API Key from the .Renviron file ```{r} #| eval: false #| label: census-key # Set your Census API key census_api_key <- Sys.getenv("CENSUS_API_KEY") if (census_api_key == "") stop("Missing Census API key. Set it in your .Renviron file.") ``` ## Trade Area Radius Helper Uses preset radii depending on the shopping center type selected in the parameters. ```{r} #| label: trade-area-config Function # Define radius based on shopping center type get_trade_area_radius <- function(center_type) { switch(center_type, "Convenience Store" = 1, "Neighborhood Shopping Center" = 1.5, "Community Shopping Center" = 6, "Regional Shopping Center" = 10, "Super-Regional Shopping Center" = 35, 1.5) # default fallback } trade_area_miles <- get_trade_area_radius(params$shopping_center_type) ``` # Functions ## Caching Function The tool includes a smart caching system to avoid redundant API calls and retrieves previously pulled data for use with timed parameters: ::: callout-tip ## How Caching Works 1. **First Query**: Data is fetched from APIs and saved to `property_analysis_cache/` 2. **Subsequent Queries**: Data is loaded from cache (much faster!) 3. **Expiration**: Demographics and amenities cache expires after 60 days 4. **Cache Key**: Based on rounded coordinates (4 decimal places ≈ 11 meters) 5. **Purge Cache?**: Call `clear_cache()` in console ::: ```{r} #| label: Caching System Function # ============================================ # CACHING SYSTEM # ============================================ cache_dir <- "property_analysis_cache" if (!dir.exists(cache_dir)) { dir.create(cache_dir) } save_to_cache <- function(data, cache_key, cache_type = "demographics") { cache_file <- file.path(cache_dir, paste0(cache_type, "_", cache_key, ".rds")) saveRDS(data, cache_file) message(paste("Cached to:", cache_file)) } load_from_cache <- function(cache_key, cache_type = "demographics", max_age_days = 60) { cache_file <- file.path(cache_dir, paste0(cache_type, "_", cache_key, ".rds")) if (file.exists(cache_file)) { file_age <- difftime(Sys.time(), file.info(cache_file)$mtime, units = "days") if (as.numeric(file_age) <= max_age_days) { message(paste("Loading from cache:", cache_file)) return(readRDS(cache_file)) } else { message(paste("Cache expired (", round(file_age, 1), "days old)")) } } return(NULL) } clear_cache <- function(cache_type = NULL) { if (is.null(cache_type)) { files <- list.files(cache_dir, full.names = TRUE) } else { files <- list.files(cache_dir, pattern = paste0("^", cache_type), full.names = TRUE) } if (length(files) > 0) { file.remove(files) message(paste("Removed", length(files), "cached files")) } else { message("No cached files to remove") } } ``` ## Geocoding Function Geocodes properties in the base .xlsx file if file does not already have longitude and latitude coordinates. ```{r} #| label: Geocoding Process Functions #| # ============================================ # GEOCODING FUNCTIONS AND EXECUTION # ============================================ # --- 1. Helper functions ----------------------------------------------------- geocode_address <- function(address) { base_url <- "https://geocoding.geo.census.gov/geocoder/locations/onelineaddress" response <- httr::GET( base_url, query = list( address = address, benchmark = "Public_AR_Current", format = "json" ) ) data <- httr::content(response, as = "parsed") if (length(data$result$addressMatches) > 0) { coords <- data$result$addressMatches[[1]]$coordinates return(c(lat = coords$y, lon = coords$x)) } else { warning(paste("Address not found:", address)) return(c(lat = NA, lon = NA)) } } reverse_geocode <- function(lat, lon) { base_url <- "https://geocoding.geo.census.gov/geocoder/geographies/coordinates" response <- httr::GET( base_url, query = list( x = lon, y = lat, benchmark = "Public_AR_Current", vintage = "Current_Current", format = "json" ) ) data <- httr::content(response, as = "parsed") return(data$result) } # --- 2. Read the Excel file dynamically ------------------------------------- input_data <- read_excel(params$data_file) %>% janitor::clean_names() # --- 3. Detect whether lat/lon already exist -------------------------------- # Normalize column names for flexibility lat_col <- intersect(names(input_data), c("lat", "latitude")) lon_col <- intersect(names(input_data), c("lon", "longitude")) if (length(lat_col) == 1 && length(lon_col) == 1 && all(!is.na(input_data[[lat_col]])) && all(!is.na(input_data[[lon_col]]))) { message("Latitude and longitude found in dataset — skipping geocoding.") geocoded_results <- input_data %>% rename(lat = all_of(lat_col), lon = all_of(lon_col)) } else { message("Latitude and longitude not found — performing geocoding...") # --- 4. Prepare address strings for geocoding ----------------------------- if (all(c("address", "city", "state") %in% names(input_data))) { input_data <- input_data %>% mutate(full_address = paste(address, city, state, sep = ", ")) } else if ("address" %in% names(input_data)) { input_data <- input_data %>% mutate(full_address = address) } else { stop("Excel file must include at least an 'address' column, or 'address', 'city', and 'state'.") } # --- 5. Apply the geocoder safely ----------------------------------------- geocode_safely <- purrr::safely(geocode_address) geocoded_results <- input_data %>% mutate(geo = map(full_address, geocode_safely)) %>% mutate( lat = map_dbl(geo, ~ .x$result["lat"] %||% NA_real_), lon = map_dbl(geo, ~ .x$result["lon"] %||% NA_real_) ) %>% select(-geo) } ``` ```{r} #| label: tbl-geocoded-results #| tbl-cap: "Geocoded Property Results" #| echo: false #| message: false #| warning: false output_format <- if (knitr::is_latex_output()) "latex" else "html" geocode_table <- kable( head(geocoded_results, 10), digits = 5, format = output_format, booktabs = TRUE, longtable = FALSE, # ✅ prevents LaTeX longtable environment linesep = "", # ✅ avoid LaTeX row spacing issues align = c("r", "l", "l", "l", "l", "r", "r", "r") ) if (output_format == "html") { geocode_table <- geocode_table %>% kable_styling( full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed", "responsive"), position = "center", font_size = 13 ) } else { geocode_table <- geocode_table %>% kable_styling( latex_options = c("hold_position", "scale_down"), font_size = 10 ) } # geocode_table ``` ## Demographics Retrieval Function Fetches demographic data from US Census API: | Variable | Description | Census Code | |-------------------|--------------------------------------|-------------| | Population | Total population in census tract | B01003_001 | | Median Income | Median household income | B19013_001 | | Median Home Value | Median value of owner-occupied homes | B25077_001 | | Median Gross Rent | Median gross rent | B25046_001 | ```{r} #| label: Data Collection Function # ============================================ # DATA COLLECTION FUNCTIONS (Radius-Based, Safe Renaming, No Unemployment) # ============================================ get_census_demographics <- function(lat, lon, census_api_key = NULL, radius_miles = trade_area_miles, use_cache = TRUE, cache_max_age = 30) { cache_key <- paste0(round(lat, 4), "_", round(lon, 4), "_r", radius_miles) if (use_cache) { cached_data <- load_from_cache(cache_key, "demographics", cache_max_age) if (!is.null(cached_data)) { message("Loaded demographics from cache for ", cache_key) return(cached_data) } } message("Cache not found or expired — fetching new Census data...") if (!is.null(census_api_key)) { tidycensus::census_api_key(census_api_key, install = FALSE) } # Build buffer geometry point <- sf::st_sfc(sf::st_point(c(lon, lat)), crs = 4326) buffer_m <- radius_miles * 1609.34 buffer_area <- sf::st_transform(point, 3857) |> sf::st_buffer(buffer_m) |> sf::st_transform(4326) # ACS variables of interest vars <- c( population = "B01003_001", median_income = "B19013_001", median_home_value = "B25077_001", median_gross_rent = "B25046_001" ) tracts_data <- tigris::tracts(cb = TRUE, year = 2023, progress_bar = FALSE) |> sf::st_transform(4326) tracts_in_area <- suppressWarnings(sf::st_intersection(tracts_data, buffer_area)) if (nrow(tracts_in_area) == 0) { warning("No census tracts found within trade area.") return(tibble()) } tract_fips <- unique(paste0(tracts_in_area$STATEFP, tracts_in_area$COUNTYFP)) message("Downloading ACS data...") demo_data_sf <- purrr::map_dfr( tract_fips, function(fips) { tryCatch({ tidycensus::get_acs( geography = "tract", variables = vars, state = substr(fips, 1, 2), county = substr(fips, 3, 5), year = 2022, survey = "acs5", geometry = TRUE ) }, error = function(e) NULL) } ) |> sf::st_transform(4326) if (nrow(demo_data_sf) == 0) { warning("No ACS data returned for this area.") return(tibble()) } demo_in_area <- suppressWarnings(sf::st_intersection(demo_data_sf, buffer_area)) if (nrow(demo_in_area) == 0) { warning("No demographic data found within buffer.") return(tibble()) } # Aggregate results demo_summary <- demo_in_area |> dplyr::group_by(variable) |> dplyr::summarise(estimate = mean(estimate, na.rm = TRUE), .groups = "drop") |> tidyr::pivot_wider(names_from = variable, values_from = estimate) # --- Safe renaming: only rename if columns exist --------------------------- rename_map <- c( B01003_001 = "population", B19013_001 = "median_income", B25077_001 = "median_home_value", B25046_001 = "median_gross_rent" ) rename_map <- rename_map[names(rename_map) %in% names(demo_summary)] demo_summary <- dplyr::rename(demo_summary, !!!rename_map) demo_summary <- demo_summary |> dplyr::mutate(radius_miles = radius_miles) if (use_cache) save_to_cache(demo_summary, cache_key, "demographics") return(demo_summary) } # Memoised version for fast in-session reuse get_census_demographics_memo <- memoise(get_census_demographics) ``` ## Amenity Function Queries OpenStreetMap [(https://wiki.openstreetmap.org/wiki/Key:amenity](https://wiki.openstreetmap.org/wiki/Key:amenity)) for nearby amenities within specified radius (`r trade_area_miles` miles for this analysis). OSM’s `amenity` key is extremely broad. It includes everything from airports to post boxes. In the context of retail demand and site selection, the relevant amenities are those that: 1. Attract regular consumer visits (e.g., restaurants, cafes, banks, pharmacies). 2. Represent daily or weekly necessities (e.g., supermarkets, convenience stores, schools). 3. Encourage dwell time or foot traffic clustering (e.g., parks, places of worship, leisure). 4. Support workforce presence (e.g., offices, post offices, government buildings). These amenities serve as proxies for consumer presence, disposable income circulation, and accessibility to routine goods and services. For a retail valuation context, they also mirror the “market support” and “trade area vitality” concepts used in market analysis texts like Fanning, *Market Analysis for Real Estate*, 2nd ed. The general amenity categories that are used in this analysis include the following: - Supermarkets - As a proxy for measuring the presence of food retail and daily household shopping activity. - Restaurants - Correlates to consumer engagement, social clustering and evening/weekend traffic. - Cafes - Also correlates to consumer engagement, social clustering and evening/weekend traffic. - Banks - Usually daytime traffic and draw to other nearby retail establishments. - Hospitals - Represents essential service and consistent daytime population. - Pharmacies - Regular essential service that often reflects high traffic. - School/College/University - Brings regular population flows including students, parents, and staff. - Parks - Encourages recreational foot traffic and neighborhood appeal. ```{r} # ============================================ # IMPROVED AMENITY COLLECTION FUNCTION # ============================================ find_nearby_amenities <- function(lat, lon, amenity_types = params$amenity_types, radius_miles = trade_area_miles, use_cache = TRUE, cache_max_age = 7) { # Fallback list if (is.null(amenity_types) || length(amenity_types) == 0) { amenity_types <- c("school", "college", "university", "restaurant", "cafe", "bank", "hospital", "pharmacy", "supermarket", "park") } cache_key <- paste0(round(lat, 4), "_", round(lon, 4), "_r", radius_miles) if (use_cache) { cached_data <- load_from_cache(cache_key, "amenities", cache_max_age) if (!is.null(cached_data)) return(cached_data) } radius_m <- radius_miles * 1609.34 bbox_buffer <- radius_m / 111320 # convert meters to degrees (approx) bbox <- c(lon - bbox_buffer, lat - bbox_buffer, lon + bbox_buffer, lat + bbox_buffer) results <- tibble(amenity = character(), count = numeric()) target <- sf::st_sfc(sf::st_point(c(lon, lat)), crs = 4326) |> sf::st_transform(3857) for (a in amenity_types) { tryCatch({ # Choose appropriate key depending on amenity key <- dplyr::case_when( a == "supermarket" ~ "shop", a == "park" ~ "leisure", TRUE ~ "amenity" ) query <- osmdata::opq(bbox = bbox) %>% osmdata::add_osm_feature(key = key, value = a) osm_data <- osmdata::osmdata_sf(query) count <- 0 # Combine points and polygons all_geoms <- list(osm_data$osm_points, osm_data$osm_polygons) all_geoms <- all_geoms[!vapply(all_geoms, is.null, logical(1))] if (length(all_geoms) > 0) { all_sf <- dplyr::bind_rows(lapply(all_geoms, function(g) { sf::st_centroid(sf::st_transform(g, 3857)) })) dists <- sf::st_distance(all_sf, target) count <- sum(as.numeric(dists) <= radius_m) } results <- dplyr::add_row(results, amenity = a, count = count) Sys.sleep(1) }, error = function(e) { warning(paste("Error fetching", a, ":", e$message)) results <<- dplyr::add_row(results, amenity = a, count = 0) }) } if (use_cache) save_to_cache(results, cache_key, "amenities") return(results) } # Memoised version find_nearby_amenities_memo <- memoise::memoise(find_nearby_amenities) ``` ```{r} #| label: Run Amenity Analysis #| echo: false #| warning: false #| message: true amenity_results <- geocoded_results %>% mutate(amenities = purrr::map2(lat, lon, ~ find_nearby_amenities_memo(.x, .y, amenity_types = params$amenity_types)))%>% tidyr::unnest(amenities) ``` ```{r} #| label: tbl-View-Amenity-Summary-Table #| tbl-cap: "Amenity Counts by Property" #| echo: false #| message: false #| warning: false # 1. Identify and read the Excel file dynamically ---------------------------- src <- read_excel(params$data_file) |> janitor::clean_names() sale_candidates <- c("sale", "sale_number", "sale_no", "sale_num", "saleid", "sale_id", "sale#", "sale_") sale_col <- names(src)[tolower(names(src)) %in% sale_candidates] if (length(sale_col) == 1) { sale_lookup <- src |> mutate(addr_key = str_trim(tolower(address))) |> select(addr_key, Sale = all_of(sale_col)) } else { warning("No Sale column found in input file. Using Address instead.") sale_lookup <- src |> mutate(addr_key = str_trim(tolower(address))) |> mutate(Sale = address) |> select(addr_key, Sale) } # 2–5. Prepare amenity summary ---------------------------------------------- expected_amenities <- c( "Supermarkets", "Restaurants", "Cafes", "Banks", "Hospitals", "Pharmacies", "Schools", "Colleges", "Universities", "Parks" ) amenity_map <- c( supermarket = "Supermarkets", restaurant = "Restaurants", cafe = "Cafes", bank = "Banks", hospital = "Hospitals", pharmacy = "Pharmacies", school = "Schools", college = "Colleges", university = "Universities", park = "Parks" ) amenity_summary <- amenity_results %>% mutate(addr_key = str_trim(tolower(address)), amenity = recode(amenity, !!!amenity_map)) %>% left_join(sale_lookup, by = "addr_key") %>% filter(amenity %in% expected_amenities) %>% select(Sale, amenity, count) %>% group_by(Sale, amenity) %>% summarise(total = sum(count, na.rm = TRUE), .groups = "drop") %>% pivot_wider( names_from = amenity, values_from = total, values_fill = list(total = 0) ) %>% arrange(Sale) for (col in expected_amenities) { if (!col %in% names(amenity_summary)) amenity_summary[[col]] <- 0 } amenity_summary <- amenity_summary %>% mutate(`Total` = rowSums(across(all_of(expected_amenities)), na.rm = TRUE)) # 6. Build kable table ------------------------------------------------------- output_format <- if (knitr::is_latex_output()) "latex" else "html" amenity_table <- kable( amenity_summary, digits = 0, format = output_format, booktabs = TRUE, escape = FALSE, longtable = FALSE, # ✅ explicitly disable longtable linesep = "" # ✅ prevents LaTeX spacing errors ) if (output_format == "html") { amenity_table <- amenity_table %>% kable_styling( full_width = TRUE, position = "center", bootstrap_options = c("condensed", "hover", "responsive") ) %>% row_spec(0, extra_css = "font-size: 10px;") %>% row_spec(1:nrow(amenity_summary), extra_css = "font-size: 10px;") } else { # PDF styling amenity_table <- amenity_table %>% kable_styling( latex_options = c("hold_position", "scale_down"), font_size = 9 ) # ✅ optional: wrap in adjustbox to guarantee page fit amenity_table <- paste0("\\begin{adjustbox}{max width=\\textwidth}", amenity_table, "\\end{adjustbox}") amenity_table <- knitr::asis_output(amenity_table) } amenity_table ``` ```{r} #| label: Export Amenity Results #| echo: false #| warning: false #| message: true # Define the output file name output_file <- paste0("Amenity_Results_", params$shopping_center_type, "_", trade_area_miles, "mi.xlsx") # Write to Excel in your working directory writexl::write_xlsx(amenity_results, path = output_file) # cat("Amenity results saved to:", output_file, "\n") # Can comment this out to not see in Quarto render ``` ## Traffic Count Information A user would need to add the traffic counts to their base Excel file. ::: callout-note ## Traffic Count Data Sources Traffic count data measure the average number of vehicles passing specific roadway points, often expressed as **Annual Average Daily Traffic (AADT)**. While this report uses data from the **Texas Department of Transportation (TxDOT)** [Open Data Portal](https://gis-txdot.opendata.arcgis.com/datasets/d5f56ecd2b274b4d8dc3c2d6fe067d37_0/explore), similar information is available from most U.S. states and many other countries. ### United States - **State Departments of Transportation (DOTs):**\ Most state DOTs publish AADT datasets through their GIS or open data portals.\ Examples include: - [Caltrans (California)](https://data.ca.gov) → *Traffic Volumes*\ - [Florida Department of Transportation (FDOT)](https://geoplan.ufl.edu/) → *Traffic Counts*\ - [North Carolina DOT (NCDOT)](https://ncdot.maps.arcgis.com) → *Traffic Survey Data* - **Federal Highway Administration (FHWA):**\ The FHWA provides nationwide roadway and traffic data through the\ [Highway Performance Monitoring System (HPMS)](https://www.fhwa.dot.gov/policyinformation/hpms.cfm). ### Canada - **Provincial Ministries of Transportation:**\ Each province maintains similar datasets: - [Ontario Open Data Catalogue](https://data.ontario.ca) → *Annual Average Daily Traffic*\ - [DataBC Catalogue (British Columbia)](https://catalogue.data.gov.bc.ca) → *Traffic Counts*\ - [GeoDiscover Alberta](https://geodiscover.alberta.ca) → *Transportation / Traffic Volume Sites* - **Municipal and Regional Sources:**\ Large municipalities (e.g., Toronto, Vancouver, Calgary) often publish local traffic counts through their open data portals. ### International - **United Kingdom:**\ [Department for Transport Open Data](https://roadtraffic.dft.gov.uk/) — includes continuous and manual count sites across England, Scotland, and Wales.\ - **European Union:**\ Many member nations release traffic and transport statistics under\ the [INSPIRE Directive](https://inspire.ec.europa.eu/) or through [Eurostat Transport Statistics](https://ec.europa.eu/eurostat/web/transport/data/database).\ - **Global Alternatives:**\ In regions without government data, commercial and open platforms such as\ [TomTom Traffic](https://developer.tomtom.com/traffic-api/documentation/traffic-flow/traffic-flow-service),\ [HERE Traffic](https://developer.here.com/documentation/traffic/dev_guide/index.html), or\ [OpenStreetMap Traffic Layers](https://www.openstreetmap.org)\ can provide approximate roadway congestion or flow estimates. ### *Future* Implementation Notes To integrate local traffic data into this analysis: 1. Download the shapefile or GeoJSON version of your region’s AADT or traffic count dataset.\ 2. Load it into R using `sf::st_read("path/to/file.shp")`.\ 3. Perform a spatial join between your property points and the nearest road segment (`sf::st_join()` or `st_nearest_feature()`).\ 4. Extract or average the relevant traffic count field for scoring. ::: ## Weighting Function/Inputs Required inputs to apply appropriate weight to the income (consumer spending power), the population density (a reflection of foot traffic potential), amenities (neighborhood quality and drawing power), property values (a function of costs to live in area), and traffic counts (a measure of visibility, accessibility and potential customers beyond population in trade area). ```{r} #| label: Define Weights #| echo: false #| eval: true # Pull weighting values directly from YAML params if (!is.null(params$weights$value)) { weights <- unlist(params$weights$value) message("Weights loaded from YAML parameters:") print(weights) } else { warning("Weights not found in params — using default values.") weights <- c( income = 0.3, population_density = 0.3, amenities = 0.10, property_value = 0.10, traffic_count = 0.20 # optional placeholder if you later add traffic weighting ) } ``` ## Scoring Function Calculates a comprehensive location score (0-100) based on five factors: ::: panel-tabset ## Income Score - Formula: `rescale(median_income, to = c(0, 100), from = range(all_incomes))` - Higher income = better location (greater purchasing power) - Dyanmically normalized to 0-100 scale **Interpretation:** The Income Score measures the relative purchasing power of households within each trade area. Median household income remains a reliable proxy for local spending capacity and the ability of residents to support a broad range of retail goods and services. Higher median incomes typically correspond with greater discretionary spending, stronger support for mid- and upper-tier retailers, and enhanced resilience during economic downturns. This uses adaptive scaling to account for any range of income levels. The score is calculated by rescaling each trade area’s median income within the range of all incomes observed across the study sample. This approach preserves meaningful differences among higher-income areas, where a capped formula would otherwise treat them as identical, and allows for balanced comparison between affluent and moderate-income markets. **Notes on the Scale:** The scaling is anchored to the lowest and highest median incomes within the comparison group. The location with the highest income receives a score of 100, while lowest income area scores a 0. Other income levels are scored proportionally. ## Population Density Score - Formula: `rescale(population, to = c(0, 100), from = range(all_populations))` - Higher population = more potential customers/activity - Normalized to 0-100 scale **Interpretation:** The Population Density Score represents the concentration of potential customers within the trade area. Retail activity depends not only on affluence but also on proximity and repetition of visits. Areas with greater population density tend to generate stronger pass-by traffic, higher visibility, and more consistent sales volumes. This metric applies adaptive scaling rather than a fixed divisor. Each trade area’s population is rescaled within the observed range of populations across the study sample. This ensures that every location receives a proportionate score, maintaining meaningful separation even among densely populated areas where a capped formula might otherwise flatten results. **Notes on the Scale:** The lowest population in the comparison group anchors the bottom of the scale at 0, while the highest defines 100. All other sites are positioned proportionally between them. ## Amenities Score - Formula: `0.4 × (Diversity %) + 0.6 × (Density %, adaptively scaled 0–100)` - More diverse and concentrated amenities = stronger location - Balances variety and overall intensity of surrounding uses **Interpretation:** The Amenities Score reflects both the diversity and density of nearby amenities that contribute to a site’s retail vitality. Diversity measures the proportion of tracked amenity categories—such as restaurants, schools, parks, banks, and pharmacies—that are represented within the trade area. Density measures the total number of amenities, scaled relative to the range observed across all study locations. Together, these components create a balanced indicator of neighborhood convenience, consumer draw, and the underlying strength of the retail ecosystem. Locations that contain all tracked amenity types receive full credit for diversity, while those with fewer categories present earn proportionally less. The density component differentiates sites based on how intensively built and commercially active they are within the same amenity mix. **Notes on the Scale:** The diversity index reaches 100 when all tracked amenity categories are represented within the trade area. The density index is adaptively normalized between the lowest and highest total amenity counts observed among all properties being compared, ensuring that each site’s score reflects its position within the actual range of market activity. The final score combines both effects, assigning 40% weight to diversity and 60% to density. This method ensures fair comparison across urban, suburban, and rural contexts without relying on arbitrary saturation thresholds. ## Property Value Score - Formula: `0.50 x min(median_home_value / 5000, 100) + 0.5 x min(median_gross_rent / 30, 100)` - Higher values = more desirable neighborhood - Normalized to 0-100 scale **Interpretation:** The Property Value Score represents a composite measure of market vitality that combines indicators of long-term stability and short-term spending capacity within a trade area. Median home value reflects neighborhood affluence, ownership stability, and the general desirability of the surrounding residential market. Median gross rent complements this by capturing the relative cost of occupancy and the local balance between household income and housing expense—an important proxy for retail purchasing power and economic elasticity. By weighting home value and rent equally (50/50), the score avoids bias toward either high-income owner-occupied areas or transient high-rent districts. A higher Property Value Score therefore signals a trade area that is both financially resilient and economically active, typically corresponding to locations that can support sustainable retail rents and consistent consumer demand. High home value and high rent → strong, affluent market with high retail viability.\ High rent but low home value → possibly transient or overburdened market.\ High home value but low rent → stable ownership but low circulation spending. **Notes on the Scale"** The divisor for home value (`/5000`) is a normalization factor that caps high-value markets near 100. You can tune it—if your markets are typically high-end, you might raise that to `/6000` or `/8000`. The divisor for gross rent (`/30`) is similar: \$30 × 100 = \$3,000, roughly the upper bound for typical monthly rent, producing comparable scaling. ## Traffic Count Score - Formula: `rescale(traffic_count, to = c(0, 100), from = range(all_traffic_counts))` - Higher traffic = greater retail visibility and consumer flow - Dynamically normalized to 0–100 scale **Interpretation:** The Traffic Count Score measures the retail exposure and accessibility of a location based on roadway volume. Higher traffic counts typically translate to greater visibility, impulse visits, and daily customer flow. This score is dynamically scaled relative to the comparison group, ensuring that properties with exceptionally high or low counts remain distinguishable within each analysis set. **Notes on the Scale:** The lowest observed traffic count anchors the scale at 0, and the highest at 100. When comparing sites across multiple markets, a shared range can be applied to maintain consistent interpretation across studies. ::: **Total Score Calculation** $$ \text{Total Score} = \sum_{i=1}^{5} (\text{Component Score}_i \times \text{Weight}_i) $$ **Interpretation:** The Total Score represents the weighted sum of five core indicators—Income, Population, Amenities, Property Value, and Traffic Count—each scaled from 0 to 100. Weights (𝑤ᵢ) control the relative importance of each factor and are defined in the document parameters. This additive framework allows the model to reflect both market potential (income, population), site quality (amenities, property value), and exposure/accessibility (traffic count). **Notes on the Scale:** Each component score is dynamically normalized within its observed range to preserve proportional differences among locations. The composite score therefore expresses each site’s overall retail viability relative to all others in the analysis. ```{r} #| label: Scoring Function (Adaptive Amenities) #| echo: false #| warning: false #| message: false # ============================================ # SCORING AND ANALYSIS FUNCTIONS (Adaptive Amenity Scaling) # ============================================ calculate_location_score <- function( lat, lon, traffic_count = NULL, weights = NULL, census_api_key = NULL, radius_miles = trade_area_miles, income_range = NULL, population_range = NULL, traffic_range = NULL, home_value_range = NULL, rent_range = NULL, amenity_div_range = NULL, amenity_den_range = NULL ) { # --- Load Weights --- if (is.null(weights)) { if (exists("params") && !is.null(params$weights$value)) { weights <- unlist(params$weights$value) message("Weights loaded from YAML parameters:") print(weights) } else { warning("No weights supplied or found in params — using default fallback values.") weights <- c( income = 0.30, population_density = 0.30, amenities = 0.10, property_value = 0.10, traffic_count = 0.20 ) } } # --- Fetch Data --- demographics <- get_census_demographics( lat, lon, census_api_key, radius_miles = radius_miles, use_cache = TRUE, cache_max_age = 30 ) amenities <- find_nearby_amenities( lat, lon, radius_miles = radius_miles, use_cache = TRUE, cache_max_age = 7 ) scores <- list() # ---- Income ---- if (nrow(demographics) > 0 && !is.na(demographics$median_income)) { scores$income <- as.numeric(scales::rescale( demographics$median_income, to = c(0, 100), from = income_range )[1]) } else scores$income <- 0 # ---- Population Density ---- if (nrow(demographics) > 0 && !is.na(demographics$population)) { scores$population_density <- as.numeric(scales::rescale( demographics$population, to = c(0, 100), from = population_range )[1]) } else scores$population_density <- 0 # ---- Amenities (Adaptive Scaling) ---- if (nrow(amenities) > 0 && "amenity" %in% names(amenities)) { total_types <- length(unique(params$amenity_types)) present_types <- length(unique(amenities$amenity)) total_amenities <- sum(amenities$count, na.rm = TRUE) # Adaptive diversity scaling (robust version) if (is.null(amenity_div_range)) amenity_div_range <- c(0, total_types) diversity_index <- if (diff(amenity_div_range) == 0) { if (present_types >= max(amenity_div_range)) 100 else 0 } else { scales::rescale(present_types, to = c(0, 100), from = amenity_div_range) } # Adaptive density scaling (bounded to 100) density_index <- pmin(100, scales::rescale( total_amenities, to = c(0, 100), from = if (is.null(amenity_den_range)) c(0, 200) else amenity_den_range )) scores$amenities <- as.numeric(0.4 * diversity_index + 0.6 * density_index) } else scores$amenities <- 0 # ---- Property Value ---- if (nrow(demographics) > 0 && (!is.na(demographics$median_home_value) || !is.na(demographics$median_gross_rent))) { home_score <- ifelse(!is.na(demographics$median_home_value), scales::rescale(demographics$median_home_value, to = c(0, 100), from = home_value_range)[1], 0 ) rent_score <- ifelse(!is.na(demographics$median_gross_rent), scales::rescale(demographics$median_gross_rent, to = c(0, 100), from = rent_range)[1], 0 ) scores$property_value <- as.numeric(0.5 * home_score + 0.5 * rent_score) } else scores$property_value <- 0 # ---- Traffic ---- if (!is.null(traffic_count) && !is.na(traffic_count)) { scores$traffic_count <- as.numeric(scales::rescale( traffic_count, to = c(0, 100), from = traffic_range )[1]) } else scores$traffic_count <- 0 # ---- Weighted Total ---- numeric_scores <- sapply(names(weights), function(w) as.numeric(scores[[w]] %||% 0)) total_score <- sum(numeric_scores * unlist(weights)) return(list( total_score = round(total_score, 2), individual_scores = scores, demographics = demographics, amenities = amenities, location = list(lat = lat, lon = lon), radius_miles = radius_miles )) } ``` ## Analysis Function Main function that ties everything together: 1. Geocodes address (if not provided in Excel file) 2. Fetches demographics 3. Queries amenities 4. Calculates scores 5. Returns comprehensive results object ```{r} #| label: Batch Comparison Function # ============================================ # BATCH COMPARISON — FULLY NORMALIZED # ============================================ compare_locations_from_excel <- function( data_file = params$data_file, census_api_key = Sys.getenv("CENSUS_API_KEY"), center_type = params$shopping_center_type, radius_miles = trade_area_miles ) { # Always pull weights from YAML weights <- unlist(params$weights$value) message("-----------------------------------------------------") message("Running Batch Location Comparison") message("Center Type: ", center_type) message("Trade Area Radius: ", radius_miles, " miles") message("Source File: ", data_file) message("-----------------------------------------------------") # --- Load data --- df <- readxl::read_excel(data_file) |> janitor::clean_names() if (!"traffic_count" %in% names(df)) { df$traffic_count <- NA_real_ } lat_col <- intersect(names(df), c("lat", "latitude")) lon_col <- intersect(names(df), c("lon", "longitude")) if (length(lat_col) != 1 || length(lon_col) != 1) { stop("Excel file must include latitude and longitude columns.") } df <- df |> dplyr::rename(lat = all_of(lat_col), lon = all_of(lon_col)) |> dplyr::filter(!is.na(lat) & !is.na(lon)) # --- Compute traffic range globally --- traffic_range <- range(df$traffic_count, na.rm = TRUE) # --- Gather demographic + amenity data for global scaling --- demo_list <- list() amenity_list <- list() for (i in seq_len(nrow(df))) { lat <- df$lat[i] lon <- df$lon[i] demo <- tryCatch( get_census_demographics(lat, lon, census_api_key, radius_miles = radius_miles), error = function(e) tibble() ) if (nrow(demo) > 0) demo_list[[i]] <- demo amen <- tryCatch( find_nearby_amenities(lat, lon, radius_miles = radius_miles), error = function(e) tibble() ) if (nrow(amen) > 0) amenity_list[[i]] <- amen } all_demo <- bind_rows(demo_list) income_range <- range(all_demo$median_income, na.rm = TRUE) population_range <- range(all_demo$population, na.rm = TRUE) home_value_range <- range(all_demo$median_home_value, na.rm = TRUE) rent_range <- range(all_demo$median_gross_rent, na.rm = TRUE) amenity_summary <- purrr::map_dfr(amenity_list, function(a) { tibble(diversity = length(unique(a$amenity)), density = sum(a$count, na.rm = TRUE)) }) div_range <- range(amenity_summary$diversity, na.rm = TRUE) den_range <- range(amenity_summary$density, na.rm = TRUE) # --- Run analysis for each property --- results <- purrr::pmap_dfr( df, function(...) { row <- tibble::tibble(...) message("Analyzing location: ", row$lat, ", ", row$lon) analysis <- calculate_location_score( lat = row$lat, lon = row$lon, traffic_count = row$traffic_count, census_api_key = census_api_key, radius_miles = radius_miles, income_range = income_range, population_range = population_range, traffic_range = traffic_range, home_value_range = home_value_range, rent_range = rent_range, amenity_div_range = div_range, amenity_den_range = den_range ) tibble::tibble( property_id = if ("id" %in% names(row)) row$id else NA_character_, address = if ("address" %in% names(row)) row$address else paste0("(", round(row$lat, 4), ", ", round(row$lon, 4), ")"), lat = row$lat, lon = row$lon, total_score = analysis$total_score, income_score = analysis$individual_scores$income, population_score = analysis$individual_scores$population_density, amenities_score = analysis$individual_scores$amenities, property_value_score = analysis$individual_scores$property_value, traffic_score = analysis$individual_scores$traffic_count ) } ) # --- Rank and export --- ranked_results <- results |> dplyr::arrange(desc(total_score)) |> dplyr::mutate(rank = dplyr::row_number()) output_file <- paste0("Location_Scoring_", gsub(" ", "_", center_type), "_", radius_miles, "mi.xlsx") writexl::write_xlsx(ranked_results, output_file) message("Results saved to: ", output_file) message("-----------------------------------------------------") return(ranked_results) } ``` ```{r} #| label: tbl-Run-Batch-Comparison #| tbl-cap: "Ranked Retail Locations" #| echo: false #| warning: false #| message: false library(dplyr) library(readxl) library(janitor) library(knitr) library(kableExtra) library(stringr) # --- 1. Run batch comparison ----------------------------------------------- location_scores <- compare_locations_from_excel() # --- 2. Read source Excel file to extract the "Sale" label ----------------- src <- read_excel(params$data_file) |> janitor::clean_names() sale_candidates <- c("sale", "sale_number", "sale_no", "sale_num", "saleid", "sale_id", "sale#", "sale_") sale_col <- names(src)[tolower(names(src)) %in% sale_candidates] if (length(sale_col) == 1) { sale_lookup <- src |> mutate(addr_key = str_trim(tolower(address))) |> select(addr_key, Sale = all_of(sale_col)) } else { warning("No Sale column found; using Address instead.") sale_lookup <- src |> mutate(addr_key = str_trim(tolower(address))) |> mutate(Sale = address) |> select(addr_key, Sale) } # --- 3. Join Sale column and select display columns ------------------------ location_display <- location_scores %>% mutate(addr_key = str_trim(tolower(address))) %>% left_join(sale_lookup, by = "addr_key") %>% mutate(rank = as.character(rank)) %>% # Force text alignment for Rank select( Rank = rank, Sale, `Total Score` = total_score, `Income Score` = income_score, `Population Score` = population_score, `Amenities Score` = amenities_score, `Property Value Score` = property_value_score, `Traffic Score` = traffic_score ) # --- 4. Format table for HTML or PDF --------------------------------------- if (knitr::is_html_output()) { tbl <- kable( head(location_display, 10), digits = 1, format = "html", align = c("l", "c", "r", "r", "r", "r", "r", "r"), caption = "Ranked Retail Locations" ) |> kable_styling( full_width = TRUE, position = "center", bootstrap_options = c("striped", "hover", "condensed", "responsive") ) knitr::asis_output(tbl) } else { # --- Use tabularx for safe PDF rendering (no tabu) ----------------------- latex_tbl <- paste0( "\\begin{table}[htbp] \\centering \\caption{Ranked Retail Locations} \\small \\begin{tabularx}{\\textwidth}{l c *{6}{>{\\raggedleft\\arraybackslash}X}} \\toprule Rank & Sale & Total Score & Income Score & Population Score & Amenities Score & Property Value Score & Traffic Score \\\\ \\midrule ", paste( apply(head(location_display, 10), 1, function(row) paste(row, collapse = " & ")), collapse = " \\\\\n" ), " \\\\\n\\bottomrule \\end{tabularx} \\end{table}" ) knitr::asis_output(latex_tbl) } ``` ## Create Combined Results Export This will create a new Excel file with all collected inforamtion. ```{r} #| label: Export Comprehensive Results #| message: true #| warning: false # ===================================================== # EXPORT COMPREHENSIVE RESULTS (Failsafe Version) # ===================================================== safely_get_col <- function(df, patterns) { cols <- names(df) match <- cols[grepl(paste(patterns, collapse = "|"), cols, ignore.case = TRUE)] if (length(match) > 0) return(match[1]) return(NULL) } # --- 1. Read and clean the base Excel file --- original_data <- readxl::read_excel(params$data_file) %>% janitor::clean_names() # --- 2. Detect or create address column --- address_col <- safely_get_col(original_data, c("^address$", "addr", "site", "property")) if (!is.null(address_col)) { original_data <- original_data %>% rename(address = !!sym(address_col)) message("Detected address column: ", address_col) } else { message("⚠️ No address column detected. Creating synthetic labels.") original_data <- original_data %>% mutate(address = paste0("Site_", row_number())) } # --- 3. Ensure lat/lon columns exist --- lat_col <- safely_get_col(original_data, c("^lat$", "latitude")) lon_col <- safely_get_col(original_data, c("^lon$", "longitude")) if (!is.null(lat_col)) original_data <- rename(original_data, lat = !!sym(lat_col)) else original_data$lat <- NA_real_ if (!is.null(lon_col)) original_data <- rename(original_data, lon = !!sym(lon_col)) else original_data$lon <- NA_real_ # --- 4. Add geocoded coordinates if missing --- if (exists("geocoded_results")) { original_data <- original_data %>% mutate(addr_key = str_trim(tolower(address))) %>% left_join( geocoded_results %>% mutate(addr_key = str_trim(tolower(address))) %>% select(addr_key, lat_geo = lat, lon_geo = lon), by = "addr_key" ) %>% mutate( lat = coalesce(lat, lat_geo), lon = coalesce(lon, lon_geo) ) %>% select(-addr_key, -lat_geo, -lon_geo) } # --- 5. Build amenity summary safely --- if (exists("amenity_results")) { amenity_summary_join <- amenity_results %>% mutate(addr_key = str_trim(tolower(address))) %>% group_by(addr_key, amenity) %>% summarise(total = sum(count, na.rm = TRUE), .groups = "drop") %>% tidyr::pivot_wider( names_from = amenity, values_from = total, values_fill = list(total = 0) ) } else { amenity_summary_join <- tibble(addr_key = character()) } # --- 6. Gather raw demographics --- demo_list <- purrr::map2_dfr( original_data$lat, original_data$lon, function(lat, lon) { if (is.na(lat) || is.na(lon)) return(tibble()) d <- get_census_demographics(lat, lon, census_api_key = Sys.getenv("CENSUS_API_KEY")) if (nrow(d) == 0) return(tibble()) d$lat <- lat; d$lon <- lon d } ) demo_join <- demo_list %>% mutate(lat_r = round(lat, 4), lon_r = round(lon, 4)) %>% select(lat_r, lon_r, population, median_income, median_home_value, median_gross_rent) # --- 7. Merge everything together --- final_export <- location_scores %>% mutate(addr_key = str_trim(tolower(address))) %>% left_join(amenity_summary_join, by = "addr_key") %>% mutate(lat_r = round(lat, 4), lon_r = round(lon, 4)) %>% left_join(demo_join, by = c("lat_r", "lon_r")) %>% left_join( original_data %>% mutate(addr_key = str_trim(tolower(address))) %>% select(-lat, -lon), by = "addr_key" ) # --- 8. Ensure essential columns exist --- for (col in c("address", "lat", "lon", "population", "median_income", "median_home_value", "median_gross_rent")) { if (!col %in% names(final_export)) final_export[[col]] <- NA } # --- 9. Reorder logical columns --- final_export <- final_export %>% select( address, lat, lon, population, median_income, median_home_value, median_gross_rent, total_score, income_score, population_score, amenities_score, property_value_score, traffic_score, everything() ) # --- 10. Write to Excel --- output_file <- paste0( "Comprehensive_Location_Results_", gsub(" ", "_", params$shopping_center_type), "_", trade_area_miles, "mi.xlsx" ) writexl::write_xlsx(final_export, output_file) ``` ## Sensivity Analysis The sensitivity analysis tests how stable each property’s ranking remains when the weighting of input factors such as income, population density, amenities, property values, and traffic counts is allowed to vary randomly within reasonable limits. This scenario analysis runs 5,000 times with alternative weighting configuration being simulated. Each configuration produces a new set of location rankings. The resulting distribution of simulated ranks for each property reflects how dependent its position is on the specific weighting assumptions used in the model. The median simulated rank represents the property’s typical performance across all simulations, while the 5th and 95th percentiles mark its best-case and worst-case outcomes, respectively. A property with a low median rank and a narrow percentile range demonstrates consistent strength and resilience to changing assumptions. In contrast, a wide percentile range indicates that the property’s relative standing shifts considerably under different weighting scenarios, suggesting greater sensitivity to how the model prioritizes its components. Together, these measures reveal not only which locations perform best on average but also which are most robust to uncertainty in the underlying valuation logic. ::: callout-note ### About the Sensitivity Simulation The “reasonable limits” for weighting variation are defined by a distribution, which produces random, positive weights that sum to one. This distribution allows each factor including Income, Population, Amenities, Property Value, and Traffic to fluctuate broadly between near-zero and moderate dominance, representing plausible shifts in judgment without introducing unrealistic extremes.Each factor can vary between very small (near 0) and fairly large (up to \~0.7–0.8), but the average weight remains around 0.20 (1/5). The Gamma(1,1) shape parameter ensures broad variability without producing absurd or hyper-concentrated weights (like one factor getting 99% of total influence). Each Monte Carlo run represents one hypothetical weighting configuration within those bounds. Monte Carlo error decreases approximately in proportion to the inverse of the square root of the number of simulations: $$ \text{Error} \propto \frac{1}{\sqrt{N}} $$ This means that as the number of scenarios (N) increases, the uncertainty in the simulated results shrinks rapidly at first and then levels off: - 100 runs → approximately 10 % uncertainty\ - 1,000 runs → approximately 3 % uncertainty\ - 5,000 runs → approximately 1.4 % uncertainty In practice, 5,000 simulations represent a balanced choice—large enough to yield stable percentile and rank estimates, yet small enough to run efficiently on a standard laptop without noticeable slowdown. ::: ```{r} #| label: fig-Sensitivity-Analysis #| fig-cap: "Sensitivity Analysis: Ranking Stability Under Weight Variation" #| fig-subcap: "Dot = Median Simulated Rank, Line = 5th–95th Percentile" #| echo: false #| message: false #| warning: false # ------------------------------------------------------ # 1. Read source file to extract Sale column # ------------------------------------------------------ src <- read_excel(params$data_file) |> janitor::clean_names() sale_candidates <- c("sale", "sale_number", "sale_no", "sale_num", "saleid", "sale_id", "sale#", "sale_") sale_col <- names(src)[tolower(names(src)) %in% sale_candidates] if (length(sale_col) == 1) { sale_lookup <- src |> mutate(addr_key = str_trim(tolower(address))) |> select(addr_key, Sale = all_of(sale_col)) } else { warning("No Sale column found; using Address instead.") sale_lookup <- src |> mutate(addr_key = str_trim(tolower(address))) |> mutate(Sale = address) |> select(addr_key, Sale) } # ------------------------------------------------------ # 2. Prepare base scores (merge in Sale label) # ------------------------------------------------------ base_scores <- location_scores %>% mutate(addr_key = str_trim(tolower(address))) %>% left_join(sale_lookup, by = "addr_key") %>% select(Sale, income_score, population_score, amenities_score, property_value_score, traffic_score) colnames(base_scores) <- c("Sale", "Income", "Population", "Amenities", "PropertyValue", "Traffic") # ------------------------------------------------------ # 3. Monte Carlo simulation # ------------------------------------------------------ set.seed(123) n_sims <- 10 simulate_rankings <- function() { weights <- rgamma(5, shape = 1) weights <- weights / sum(weights) names(weights) <- c("Income", "Population", "Amenities", "PropertyValue", "Traffic") base_scores %>% mutate( TotalScore = Income * weights["Income"] + Population * weights["Population"] + Amenities * weights["Amenities"] + PropertyValue * weights["PropertyValue"] + Traffic * weights["Traffic"] ) %>% arrange(desc(TotalScore)) %>% mutate(Rank = row_number()) %>% select(Sale, Rank) } sim_results <- purrr::map_dfr(seq_len(n_sims), ~simulate_rankings(), .id = "Sim") rank_summary <- sim_results %>% group_by(Sale) %>% summarise( median_rank = median(Rank), rank_low = quantile(Rank, 0.05), rank_high = quantile(Rank, 0.95), .groups = "drop" ) %>% arrange(median_rank) # ------------------------------------------------------ # 4. Plot # ------------------------------------------------------ ggplot(rank_summary, aes(x = reorder(Sale, median_rank))) + geom_errorbar( aes(ymin = rank_low, ymax = rank_high), width = 0.2, color = "gray50", linewidth = 1 ) + geom_point( aes(y = median_rank), size = 3, color = "steelblue" ) + coord_flip() + scale_y_reverse(breaks = seq(1, nrow(rank_summary), 1)) + labs( title = "Monte Carlo Variation", x = "Sale", y = "Simulated Rank (Lower = Better)" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold"), panel.grid.minor = element_blank() ) ``` ```{r} #| label: tbl-Sensitivity-Analysis-Summary-Table #| tbl-cap: "Simulated Ranking Stability Summary" #| echo: false #| message: false #| warning: false # Format results as a table rank_summary_table <- rank_summary %>% mutate( `Median Rank` = round(median_rank, 1), `5th Percentile` = round(rank_low, 1), `95th Percentile` = round(rank_high, 1) ) %>% select(Sale, `Median Rank`, `5th Percentile`, `95th Percentile`) %>% arrange(`Median Rank`) # Render formatted table knitr::kable( rank_summary_table, digits = 1, align = "lccc", format = ifelse(knitr::is_latex_output(), "latex", "html") ) %>% kable_styling( full_width = FALSE, position = "center", bootstrap_options = c("striped", "hover", "condensed", "responsive"), latex_options = "scale_down" ) ``` ## Ranked Location Map ```{r} #| label: fig-Location-Map #| echo: false #| warning: false #| message: false # --- Step 1: Read comprehensive export (which contains Sale) --- comp_file <- paste0( "Comprehensive_Location_Results_", gsub(" ", "_", params$shopping_center_type), "_", trade_area_miles, "mi.xlsx" ) if (!file.exists(comp_file)) { stop("Expected file not found: ", comp_file, "\nMake sure the 'Export Comprehensive Results' chunk runs before this map.") } # Read and clean map_data <- readxl::read_xlsx(comp_file) %>% janitor::clean_names() # --- Step 2: Identify essential columns and normalize --- col_lat <- intersect(names(map_data), c("lat", "latitude"))[1] col_lon <- intersect(names(map_data), c("lon", "longitude", "long", "x"))[1] col_sale <- intersect(names(map_data), c("sale", "sale_number", "sale_no", "saleid", "sale_id"))[1] col_rank <- intersect(names(map_data), c("rank", "ranking", "final_rank"))[1] col_score <- intersect(names(map_data), c("total_score", "score", "overall_score"))[1] # Find address-like column dynamically (still needed for labeling completeness) potential_addr_cols <- names(map_data)[grepl("address|site|center|name|location", names(map_data), ignore.case = TRUE)] col_addr <- if (length(potential_addr_cols) > 0) potential_addr_cols[1] else NA_character_ # Validate coordinates if (is.na(col_lat) || is.na(col_lon)) stop("Latitude or longitude not found in comprehensive file.") if (is.na(col_sale)) stop("Sale column not found in comprehensive file.") # --- Step 3: Build clean spatial frame --- sf_locations <- map_data %>% mutate( Sale = .data[[col_sale]], Rank = if (!is.na(col_rank)) .data[[col_rank]] else NA_integer_, Address = if (!is.na(col_addr)) as.character(.data[[col_addr]]) else paste("Sale", .data[[col_sale]]), Total_Score = if (!is.na(col_score)) .data[[col_score]] else NA_real_, lon = as.numeric(.data[[col_lon]]), lat = as.numeric(.data[[col_lat]]) ) %>% filter(!is.na(lat), !is.na(lon)) # --- Step 4: Label adjustment for Subject vs. other Sales --- sf_locations <- sf_locations %>% mutate( Label = dplyr::case_when( grepl("subject", tolower(Sale)) ~ "Subject", TRUE ~ paste0("Sale ", Sale) ), Label_Rank = paste0(Label, " – Ranked ", Rank) # ✅ pre-build label column ) %>% sf::st_as_sf(coords = c("lon", "lat"), crs = 4326, remove = FALSE) # --- Step 5: Build buffers (always run before format split) --- if (!exists("sf_locations")) stop("sf_locations not found — check earlier code chunk.") buffers <- tryCatch({ st_transform(sf_locations, 3857) %>% st_buffer(trade_area_miles * 1609.34) %>% st_transform(4326) }, error = function(e) { message("Warning: could not generate buffers — ", e$message) NULL }) # --- Step 5.5: Assign colors for map plotting --- sf_locations <- sf_locations %>% mutate( FillColor = ifelse(grepl("subject", tolower(Sale)), "red", "darkblue"), LabelColor = ifelse(grepl("subject", tolower(Sale)), "red", "black") ) # --- Step 6: Plot (HTML or PDF) --- if (knitr::is_html_output()) { # Split data into subject vs others for label styling subj_pts <- dplyr::filter(sf_locations, grepl("subject", tolower(Sale))) other_pts <- dplyr::filter(sf_locations, !grepl("subject", tolower(Sale))) leaflet(sf_locations) %>% addProviderTiles("CartoDB.Positron") %>% { if (!is.null(buffers)) addPolygons(., data = buffers, fillColor = "steelblue", fillOpacity = 0.1, color = "blue", weight = 1) else . } %>% addCircleMarkers( radius = 7, color = ~FillColor, fillColor = ~FillColor, fillOpacity = 0.9, stroke = TRUE, weight = 1 ) %>% # Subject labels in red addLabelOnlyMarkers( data = subj_pts, label = ~Label_Rank, labelOptions = labelOptions( noHide = TRUE, direction = "top", textOnly = TRUE, style = list( "color" = "red", "font-weight" = "bold", "font-size" = "14px", "text-shadow" = "1px 1px 2px white" ) ) ) %>% # Other labels in black addLabelOnlyMarkers( data = other_pts, label = ~Label_Rank, labelOptions = labelOptions( noHide = TRUE, direction = "top", textOnly = TRUE, style = list( "color" = "black", "font-weight" = "bold", "font-size" = "14px", "text-shadow" = "1px 1px 2px white" ) ) ) } else { # --- PDF Mode (LaTeX-safe rendering) --- tmap_mode("plot") map_plot <- tm_shape(buffers) + tm_polygons(col = "steelblue", alpha = 0.1, border.col = "blue", lwd = 0.7) + tm_shape(sf_locations) + tm_symbols( fill = "FillColor", border.col = "gray40", size = 0.1 ) + # ✅ Text color now varies based on LabelColor tm_text( text = "Label_Rank", col = "LabelColor", size = 0.6, just = "center", fontface = "bold" ) + tm_layout( title = paste0( "Retail Locations Ranked by Total Score (", trade_area_miles, "-Mile Trade Areas)" ), legend.show = FALSE, frame = FALSE ) tmap_save( map_plot, filename = "map_temp.pdf", width = 7, height = 5, units = "in" ) knitr::include_graphics("map_temp.pdf") } ``` @fig-Location-Map: Ranked Location Map by Total Score. ## Radar Chart ```{r} #| label: fig-Radar-Chart #| fig-cap: "Component Comparisons Radar Chart" #| echo: false #| warning: false #| message: false # -------------------------------------------------- # 1. Recover the original Sale labels from Excel # -------------------------------------------------- src <- read_excel(params$data_file) |> clean_names() sale_candidates <- c("sale", "sale_number", "sale_no", "saleid", "sale_id", "sale_num", "sale#") sale_col <- names(src)[tolower(names(src)) %in% sale_candidates] lat_col <- intersect(names(src), c("lat", "latitude")) lon_col <- intersect(names(src), c("lon", "longitude")) # Build a simplified lookup with coordinates and Sale label if (length(sale_col) == 1 && length(lat_col) >= 1 && length(lon_col) >= 1) { sale_lookup <- src %>% mutate(lat_join = coalesce(!!sym(lat_col[1]), NA_real_), lon_join = coalesce(!!sym(lon_col[1]), NA_real_)) %>% select(Sale = all_of(sale_col[1]), lat_join, lon_join) } else { stop("Could not locate Sale or coordinate columns in Excel file.") } # -------------------------------------------------- # 2. Join Sale labels back into location_scores # -------------------------------------------------- radar_join <- location_scores %>% mutate(lat_join = round(as.numeric(lat), 4), lon_join = round(as.numeric(lon), 4)) %>% left_join( sale_lookup %>% mutate(lat_join = round(lat_join, 4), lon_join = round(lon_join, 4)), by = c("lat_join", "lon_join") ) # If join fails, fallback to sequential labels radar_join <- radar_join %>% mutate(Sale = ifelse(is.na(Sale), paste0("Sale ", rank), Sale)) # -------------------------------------------------- # 3. Prepare radar chart data # -------------------------------------------------- radar_data <- radar_join %>% select( Sale, Income = income_score, Population = population_score, Amenities = amenities_score, PropertyValue = property_value_score, Traffic = traffic_score ) # Scale each variable to 0–100 for consistency radar_data_scaled <- radar_data %>% mutate(across(-Sale, ~ scales::rescale(.x, to = c(0, 100)))) # Keep only top N ranked sites top_n <- 5 top_sites <- radar_join %>% arrange(rank) %>% slice_head(n = top_n) %>% pull(Sale) radar_subset <- radar_data_scaled %>% filter(Sale %in% top_sites) # Convert to long format for plotting radar_long <- radar_subset %>% pivot_longer( cols = -Sale, names_to = "Factor", values_to = "Score" ) # -------------------------------------------------- # 4. Plotly version (for HTML) # -------------------------------------------------- if (knitr::is_html_output()) { fig <- plot_ly( type = 'scatterpolar', fill = 'toself' ) for (site in unique(radar_long$Sale)) { df_site <- radar_long %>% filter(Sale == site) fig <- fig %>% add_trace( r = df_site$Score, theta = df_site$Factor, name = site, mode = "lines+markers", fill = "toself" ) } fig <- fig %>% layout( polar = list( radialaxis = list( visible = TRUE, range = c(0, 100), tickfont = list(size = 10) ) ), legend = list( title = list(text = "Sale"), orientation = "v" ), title = list( text = paste0("Component Comparison Radar Chart — Top ", top_n, " Sales"), font = list(size = 16) ) ) fig # -------------------------------------------------- # 5. ggplot fallback (for PDF) # -------------------------------------------------- } else { ggplot(radar_long, aes(x = Factor, y = Score, group = Sale, color = Sale)) + geom_polygon(fill = NA, linewidth = 0.7, alpha = 0.5) + geom_point(size = 2) + coord_polar() + ylim(0, 100) + labs( title = paste0("Component Comparison Radar Chart — Top ", top_n, " Sales"), y = "Score (0–100)", x = NULL ) + theme_minimal(base_size = 12) + theme( axis.text.x = element_text(size = 10, face = "bold"), plot.title = element_text(face = "bold", hjust = 0.5), legend.title = element_text(size = 11, face = "bold"), legend.position = "bottom" ) } ``` ## Weighted Sensivity Heatmap ```{r} #| label: fig-Heatmap #| echo: false #| warning: false #| message: false #| fig-cap: "Weighted Factor Sensitivity Heatmap" # -------------------------------------------------- # 1. Simulate random weighting scenarios again # -------------------------------------------------- set.seed(123) n_sims <- 100 factors <- c("Income", "Population", "Amenities", "PropertyValue", "Traffic") # Generate a matrix of random weights that always sum to 1 random_weights <- replicate( n_sims, { w <- rgamma(length(factors), shape = 1) w / sum(w) }, simplify = "matrix" ) colnames(random_weights) <- paste0("Sim_", seq_len(n_sims)) rownames(random_weights) <- factors # -------------------------------------------------- # 2. Compute ranks for each simulation # -------------------------------------------------- sim_rank_results <- map_dfr( seq_len(n_sims), function(i) { w <- random_weights[, i] base_scores %>% mutate( TotalScore = Income * w["Income"] + Population * w["Population"] + Amenities * w["Amenities"] + PropertyValue * w["PropertyValue"] + Traffic * w["Traffic"] ) %>% arrange(desc(TotalScore)) %>% mutate(Rank = row_number(), Simulation = i) %>% select(Sale, Rank, Simulation) } ) # -------------------------------------------------- # 3. Calculate correlation between each factor weight and resulting rank # -------------------------------------------------- correlation_results <- map_dfr( unique(sim_rank_results$Sale), function(site) { site_ranks <- sim_rank_results %>% filter(Sale == site) %>% arrange(Simulation) corrs <- map_dbl(factors, function(f) { cor(site_ranks$Rank, random_weights[f, site_ranks$Simulation], method = "spearman") }) tibble(Sale = site, Factor = factors, Correlation = corrs) } ) # -------------------------------------------------- # 4. Plot heatmap # -------------------------------------------------- ggplot(correlation_results, aes(x = Factor, y = reorder(Sale, desc(Sale)), fill = Correlation)) + geom_tile(color = "white") + scale_fill_gradient2( low = "firebrick3", mid = "white", high = "steelblue", midpoint = 0, limits = c(-1, 1), name = "Rank–Weight\nSpearman Correlation" ) + labs( title = "Weighted Factor Sensitivity Heatmap", subtitle = "Shows how each factor’s weight influences rank outcomes", x = "Factor Weight", y = "Sale" ) + theme_minimal(base_size = 12) + theme( plot.title = element_text(face = "bold", size = 14), axis.text.y = element_text(size = 10), axis.text.x = element_text(angle = 45, hjust = 1), panel.grid = element_blank() ) ``` ## Amenity Density Map Still needs work ```{r} #| label: Amenity Density Map #| echo: false #| warning: false #| message: false sf_amenities <- amenity_results %>% mutate(lat = as.numeric(lat), lon = as.numeric(lon), amenity = as.factor(amenity)) %>% filter(!is.na(lat), !is.na(lon)) %>% st_as_sf(coords = c("lon", "lat"), crs = 4326, remove = FALSE) pal <- colorFactor("Set1", sf_amenities$amenity) leaflet(sf_amenities, options = leafletOptions(zoomSnap = 0.25)) %>% addProviderTiles("CartoDB.Positron") %>% addCircleMarkers( lng = ~lon, lat = ~lat, color = ~pal(amenity), radius = 4, stroke = FALSE, fillOpacity = 0.8, label = ~stringr::str_to_title(amenity), clusterOptions = markerClusterOptions( spiderfyOnMaxZoom = TRUE, spiderLegPolylineOptions = list(weight = 0.6, color = "gray70", opacity = 0.6), showCoverageOnHover = FALSE, disableClusteringAtZoom = 12 ) ) %>% addLegend(pal = pal, values = ~amenity, title = "Amenity Type", position = "bottomright") ``` ## Final Summary Table Comparative analysis is defined as "the process by which a value indication is derived in the sales comparison approach. Comparative analysis may employ quantitative or qualitative techniques, either separately or in combination."[^1] The Appraisal of Real Estate outlines examples of the techniques used in quantitative adjustments and qualitative analyses as shown in @tbl-techniques.[^2] [^1]: *The Dictionary of Real Estate Appraisal*, 7th ed. (Chicago: Appraisal Institute, 2022), s.v. “comparative analysis.” [^2]: *The Appraisal of Real Estate*, 15th ed. (Chicago: Appraisal Institute, 2020), 362. ```{r} #| label: tbl-techniques #| tbl-cap: "Techniques Used in Quantitative and Qualitative Analysis" #| echo: false #| warning: false #| message: false library(knitr) library(kableExtra) # Create the data frame techniques <- data.frame( `Quantitative Analysis` = c( "Paired data analysis (sales and resales of the same or similar properties)", "Grouped data analysis", "Secondary data analysis", "Statistical analysis including graphic analysis and scenario analysis", "Cost-related adjustments (cost to cure, depreciated cost)", "Capitalization of income differences", "Trend analysis" ), `Qualitative Analysis` = c( "Relative comparison analysis", "Ranking analysis", "Personal interviews", "", "", "", "" ), stringsAsFactors = FALSE ) # Render the table cleanly and narrowly knitr::kable( techniques, align = c("l", "l"), booktabs = TRUE, col.names = c("Quantitative Analysis", "Qualitative Analysis") ) %>% kable_styling( full_width = FALSE, position = "center", font_size = 10, latex_options = "hold_position" ) %>% column_spec(1, width = "6cm") %>% column_spec(2, width = "5cm") ``` When adequate data is available, quantitative analysis can be a helpful tool, but often the differences between properties makes it challenging to determine appropriate adjustments with a level of certainty. In these cases qualitative analysis or ranking can be helpful."[^3] Qualitative analysis recognizes the inefficiencies of real estate markets and the difficulty of expressing adjustments with mathematical precision." [^3]: *Appraisal of Real Estate*, 15th ed. , 376. The analysis to this point has described in detail how the properties analysed for location attributes contribute to retail property success. As shown in @tbl-Executive-Summary, the subject property is near the top of the ranking analysis suggesting a value at or below Sale 1 and at or above the remaining sales. ```{r} #| label: tbl-Executive-Summary #| tbl-cap: "Executive Summary — Retail Location Comparison Summary" #| echo: false #| warning: false #| message: false # ------------------------------------------------------------------ # 1. Verify that location_scores exists # ------------------------------------------------------------------ if (!exists("location_scores")) stop("location_scores not found — run analysis first.") # ------------------------------------------------------------------ # 2. Read Excel source file to retrieve Sale and Price/SF # ------------------------------------------------------------------ src <- read_excel(params$data_file) |> clean_names() sale_candidates <- c("sale", "sale_number", "sale_no", "sale_num", "saleid", "sale_id", "sale#", "sale_") sale_col <- names(src)[tolower(names(src)) %in% sale_candidates] price_candidates <- c("price_sf", "price_per_sf", "price_psf", "price_sqft", "price_sq_ft") price_col <- names(src)[tolower(names(src)) %in% price_candidates] lat_col <- intersect(names(src), c("lat", "latitude"))[1] lon_col <- intersect(names(src), c("lon", "longitude", "long", "x"))[1] # Build lookup table from Excel if (length(sale_col) == 1 && length(price_col) == 1 && !is.na(lat_col) && !is.na(lon_col)) { src_lookup <- src |> mutate( lat_join = coalesce(!!sym(lat_col), NA_real_), lon_join = coalesce(!!sym(lon_col), NA_real_) ) |> select(Sale = all_of(sale_col[1]), Price_SF = all_of(price_col[1]), lat_join, lon_join) } else { stop("Could not locate Sale, Price/SF, or coordinate columns in Excel file.") } # ------------------------------------------------------------------ # 3. Join Excel Sale + Price/SF data back into location_scores # ------------------------------------------------------------------ summary_data <- location_scores |> mutate( lat_join = round(as.numeric(lat), 4), lon_join = round(as.numeric(lon), 4) ) |> left_join( src_lookup |> mutate( lat_join = round(lat_join, 4), lon_join = round(lon_join, 4) ), by = c("lat_join", "lon_join") ) |> mutate( Rank = rank, `Total Score` = round(total_score, 1), # Ensure Price/SF is properly formatted with a dollar sign `Price/SF` = case_when( is.na(Price_SF) ~ NA_character_, str_detect(as.character(Price_SF), "^\\$") ~ as.character(Price_SF), TRUE ~ dollar(as.numeric(Price_SF), accuracy = 1) ) ) |> arrange(Rank) |> select(Rank, Sale, `Total Score`, `Price/SF`) # ------------------------------------------------------------------ # 4. Render formatted table (no footnotes) # ------------------------------------------------------------------ tbl <- kable( summary_data, digits = 1, format = ifelse(knitr::is_latex_output(), "latex", "html"), booktabs = TRUE, longtable = FALSE, # ✅ Disable longtable to prevent overflow align = c("c", "c", "c", "c") ) |> kable_styling( full_width = FALSE, position = "center", latex_options = c("hold_position", "scale_down") # ✅ Force scale-down for LaTeX ) |> column_spec(1, width = "6em", latex_valign = "m", extra_css = "text-align:left;") |> column_spec(2, width = "6em", latex_valign = "m", extra_css = "text-align:center;") |> row_spec(0, bold = TRUE) # ------------------------------------------------------------------ # 5. Output safely for both HTML and PDF # ------------------------------------------------------------------ if (knitr::is_latex_output()) { # Render as proper LaTeX output (not plain text) knitr::asis_output(tbl) } else { tbl } ```