Spatial analysis in geography is not only concerned with identifying patterns, but with explaining how those patterns differ across space relative to a broader context. In urban environments such as New York City, population distributions are highly uneven, reflecting long-standing processes of segregation, migration, and economic restructuring. Simply mapping counts or proportions of demographic groups does not fully capture these spatial relationships. What is required is a method that allows comparison between local conditions and the broader regional structure.
The Location Quotient (LQ) provides this analytic capability. Rather than describing how many individuals of a given group live in a census tract, LQ measures whether that group is more or less concentrated in that tract relative to the overall population distribution of the city. This allows for the identification of areas of over-representation and under-representation, making it a powerful tool for understanding spatial inequality, clustering, and demographic structure.
In the context of New York City, this is particularly important. The city is composed of diverse neighborhoods with distinct demographic profiles, and these patterns are shaped by historical processes such as redlining, housing policy, and migration flows. By applying Location Quotients to tract-level Census data, this analysis moves beyond descriptive mapping and toward a comparative spatial framework that reveals how demographic groups are distributed relative to the city as a whole.
This analysis was conducted through the aid of ChatGPT as to assist in,
File path errors
Layer-reading problems
Missing packages
Incorrect field names
Non-numeric variables
Division-by-zero issues
Using tract-level Census data for New York City, this analysis measures the relative concentration of four race and ethnicity groups by calculating Location Quotients across census tracts, in order to identify where groups are over- or under-represented relative to the citywide population distribution.
This objective reflects the structure of the LQ statistic, which compares tract-level composition (numerator) to regional composition (denominator), producing a ratio that is directly interpretable in spatial terms
This analysis follows a structured analytic pipeline, similar to the Basic Statistics assignment, but extended to include multi-scale comparison and ratio-based analysis.
Workflow Steps
Establish working environment and load required libraries
Read and inspect the tract-level data-set from the GeoPackage
Validate data structure and confirm required fields
Define total population and race/ethnicity group variables
Calculate regional totals and proportions
Calculate tract-level proportions
Construct Location Quotients using standardized formula
Validate results through summary statistics and inspection
Prepare a clean output dataset
Export results as a .csv file
This workflow reflects the assignment requirement to move from data access to verified analytic output, while demonstrating understanding of each component of the statistic.
Before implementing the code, it is necessary to clearly define the statistic being constructed.
The Location Quotient is calculated as:
LQ=Tract-Level ProportionRegional ProportionLQ = \frac{\text{Tract-Level Proportion}}{\text{Regional Proportion}}LQ=Regional ProportionTract-Level Proportion
Where:
Tract-Level Proportion = group population in tract ÷ total population in tract
Regional Proportion = total group population ÷ total population across NYC
LQ > 1 → the group is more concentrated in the tract than expected
LQ = 1 → the tract reflects the regional average
LQ < 1 → the group is underrepresented
LQ is widely used in geographic analysis because it introduces relational context. Instead of asking “how many,” it asks:
How does this location compare to the broader system?
Where are concentrations forming?
Where are groups relatively absent?
This makes it especially useful for:
Identifying segregation patterns.
Detecting clustering of populations.
Comparing neighborhoods within cities.
Supporting policy and planning decisions.
In this assessment, LQ transforms tract-level Census data into a comparative spatial measure that reveals how demographic patterns in NYC relate to the overall structure of the city.
This section loads the packages required to complete the Location Quotient workflow in a reproducible way. These libraries establish the analytical environment before any data are read or transformed. The workflow depends on spatial data being opened correctly, population fields being manipulated reliably, and the final work being presented in a structured knitted document. In this assignment, these libraries matter because the analysis is not just tabular. It begins with tract polygons stored in a GeoPackage, then moves through field validation, proportional calculations, and final export into a non-spatial table. This means the code needs tools for both geographic data management and standard data handling. As in the Basic Statistics assignment, this section confirms that the required computational environment is available before starting the analysis.
The code process begins with library(sf), which loads
the Simple Features package for reading and handling spatial vector
data. This is what later allows the tract layer to be imported from the
GeoPackage and stored as an sf object with both geometry
and attributes (Pebesma, 2018). The library(dplyr) call
loads tools for data manipulation, especially selecting columns and
shaping the final export table. The library(sp) call adds
compatibility with older spatial classes, which can still be useful in
geographic workflows. The library(rmarkdown) and
library(knitr) calls support reproducible reporting by
controlling how the code, results, and narrative text are rendered into
the final document. Together, these packages make it possible to build a
geographic analytical workflow that is both computationally functional
and clearly documented.
library(sf) # ANALYSIS - Simple Features for R - Handling Geographic Data.
## Warning: package 'sf' was built under R version 4.5.2
library(dplyr) # ANALYSIS - A Grammar for Data Manipulation.
## Warning: package 'dplyr' was built under R version 4.5.2
library(sp) # ANALYSIS - Classes and Methods for Spatial Data.
library(rmarkdown) # DELIVERABLE - A Notebook Interface to Produce Elegantly Formatted Output.
library(knitr) # DELIVERABLE - Elegant, Flexible, and Fast Dynamic Report Generation.
This section stabilizes the file structure for the workflow by defining the knit root directory and checking that the project folder contains the expected files. This is an important environmental step because the script depends on being able to find the GeoPackage and later write the CSV output to a known location. In a reproducible analysis, file access cannot be left to guesswork. The working environment has to be clearly established so that the code behaves the same way each time it is run. This matters especially in an assignment like this, where the workflow moves through multiple stages and where a simple file path problem could interrupt the entire analytical process before any LQ calculations begin.
The code here starts with knitr::opts_knit$set(), which
assigns the root directory used when the document is knitted. This
ensures that relative file paths are resolved from the correct folder
rather than from some temporary location. The calls to
list.dirs() and list.files() then act as
environmental checks. list.dirs() prints the folder
structure so you can verify the broader project organization, while
list.files() prints the files in the working directory so
you can confirm that the GeoPackage is actually present. These commands
do not modify the data, but they verify that the environment is prepared
correctly before the dataset is opened.
knitr::opts_chunk$set(
echo = TRUE,
include = TRUE
)
list.dirs()
## [1] "."
## [2] "./26.4.2_R_script_location_quotients"
list.files()
## [1] "26.4.2_R_script_location_quotients"
## [2] "26.4.2_R_script_location_quotients.zip"
## [3] "labwork_r_script_guide_v1.R"
## [4] "lq_r_script.html"
## [5] "lq_r_script.Rmd"
## [6] "new_york_city_metro_2022_class.gpkg"
## [7] "nyc_tract_location_quotients.csv"
## [8] "student_green_assignment_five_data.csv"
This section identifies the GeoPackage path, inspects the available layers inside it, and imports the tract layer that will be used for the analysis. This is the point where the workflow moves from setup into active data access. The assignment is specifically concerned with tract-level Census data for New York City, so loading the correct tract layer is critical. If the wrong layer is selected, then the entire analytical workflow would be built on the wrong geography or the wrong variables. Because GeoPackages can contain multiple layers, the script needs to confirm both the file path and the correct layer name before proceeding.
Here we use
file.path(getwd(), "new_york_city_metro_2022_class.gpkg")
which constructs the full file path to the GeoPackage by combining the
working directory with the file name. This is more reliable than
hard-coding a full path because it keeps the workflow portable. The
st_layers(gpkg_path) command inspects the GeoPackage and
prints the names and properties of the layers stored inside it.
gpkg_path <- file.path(getwd(), "new_york_city_metro_2022_class.gpkg")
st_layers(gpkg_path) # OPTIONAL
## Driver: GPKG
## Available layers:
## layer_name geometry_type features fields
## 1 cb_2022_us_tract510_500k_nyc_v2a Multi Polygon 2323 75
## 2 neighborhood_boundaries_nyc_2019 Multi Polygon 195 11
## 3 neighborhood_locations_nyc_2019 Point 299 21
## 4 cb_2022_us_county510_500k_nyc Polygon 7 10
## 5 cb_2022_us_tract510_500k_nyc_v2a_updated Multi Polygon 2323 79
## 6 cb_2022_us_tract510_500k_nyc_v2a_updated_V2 Point 2704 32
## 7 cb_2022_us_tracts_500k_nyc_v2a_DBlack_V2 Point 2323 21
## 8 rfin_2023_nyc_investor_purchase_estimates NA 2881 5
## 9 rfin_2023_nyc_investor_market_estimates NA 2187 6
## crs_name
## 1 North_America_Albers_Equal_Area_Conic
## 2 North_America_Albers_Equal_Area_Conic
## 3 North_America_Albers_Equal_Area_Conic
## 4 North_America_Albers_Equal_Area_Conic
## 5 North_America_Albers_Equal_Area_Conic
## 6 North_America_Albers_Equal_Area_Conic
## 7 North_America_Albers_Equal_Area_Conic
## 8 <NA>
## 9 <NA>
This is a diagnostic step that helps verify that the tract layer
exists and is named correctly the
layer_name <- "cb_2022_us_tract510_500k_nyc_v2a_updated"
line then stores the exact layer name you want to read. Finally,
st_read(dsn = gpkg_path, layer = layer_name, quiet = FALSE)
imports that layer into the object nyc_trct. The result is
an sf object that contains both the tract boundaries and
the attribute table that holds the Census variables needed for the LQ
calculations
layer_name <- "cb_2022_us_tract510_500k_nyc_v2a_updated"
nyc_trct <- st_read(
dsn = gpkg_path,
layer = layer_name,
quiet = FALSE
)
## Reading layer `cb_2022_us_tract510_500k_nyc_v2a_updated' from data source
## `D:\2026_GES_666_Critical_Maps\Lab_4\new_york_city_metro_2022_class.gpkg'
## using driver `GPKG'
## Simple feature collection with 2323 features and 79 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1718547 ymin: 256711 xmax: 1756674 ymax: 311508
## Projected CRS: North_America_Albers_Equal_Area_Conic
This section inspects the dataset after import to verify that it has the correct structure and contains the variables required for analysis. Before performing any calculations, the script needs to confirm that the data were read correctly, that geometry is present, and that the expected population fields exist in usable formats. This mirrors the logic of the Basic Statistics assignment, where early inspection helps prevent later errors from developing silently. Since Location Quotients depend on accurate tract identifiers, total population, and group counts, this inspection stage is a necessary checkpoint between reading the data and constructing the statistic.
The code process uses head() and tail() to
preview different parts of the dataset, allowing you to see sample
records without printing the entire table. The
str(nyc_trct) command prints the structure of the object,
including variable classes, which helps confirm whether demographic
fields are numeric or stored incorrectly as character text. The
class(nyc_trct) command verifies that the imported dataset
is in fact an sf object, while
attr(nyc_trct, "sf_column") identifies which column stores
the geometry. The st_geometry(nyc_trct) call confirms that
spatial geometry is present. The summary(nyc_trct[, 6:11])
command then gives a quick statistical overview of selected fields.
After that, "GEOID" %in% names(nyc_trct) checks whether
the tract identifier exists, and the vector test against the race fields
checks whether the major race and ethnicity count fields are
present.
# Check that the tract identifier exists
"GEOID" %in% names(nyc_trct)
## [1] TRUE
The sapply(..., class) step then checks the current
classes of the key variables after dropping geometry, which helps
determine whether type conversion will be needed later. None of these
commands alter the dataset. Instead, they function as structural
diagnostics that confirm the analysis can proceed safely.
# Check current classes of key fields
sapply(
st_drop_geometry(nyc_trct)[, c(
"GEOID",
"DP05_0033E",
"DP05_0037E",
"DP05_0038E",
"DP05_0044E",
"DP05_0039E",
"DP05_0052E",
"DP05_0057E",
"DP05_0058E"
)],
class
)
## GEOID DP05_0033E DP05_0037E DP05_0038E DP05_0044E DP05_0039E DP05_0052E
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## DP05_0057E DP05_0058E
## "numeric" "numeric"
This section creates a non-spatial version of the dataset for later tabular analysis and export. Although the tract boundaries are necessary to define the geographic unit of analysis, the actual LQ calculations rely on population counts and proportions rather than on geometry itself. Keeping the spatial object during the early part of the workflow is useful because it preserves the complete dataset, but later calculations and exports are easier to manage in a standard data frame without the geometry column. This step therefore separates the geographic structure of the data from the tabular operations needed to build the final output.
The code process first uses class(nyc_trct) to confirm
that the original dataset remains a spatial object. It then uses
st_drop_geometry(nyc_trct) to create nyc_tab,
which is a non-spatial copy of the same attribute data. This function
removes only the geometry column and leaves the Census fields intact.
The class(nyc_tab) call then confirms that the result is
now a standard table rather than an sf object, and
head(nyc_tab) previews the non-spatial table. This is not
yet the final export. Instead, it creates a cleaner working version of
the dataset for tabular operations and prepares for the later CSV
output.
# Keep geometry during analysis
class(nyc_trct)
## [1] "sf" "data.frame"
# Create a non-spatial table copy only when needed
nyc_tab <- st_drop_geometry(nyc_trct)
class(nyc_tab)
## [1] "data.frame"
head(nyc_tab)
## OBJECTID_1 fid_1 STATEFP COUNTYFP TRACTCE GEOID NAME
## 1 1 1 36 081 045000 36081045000 450
## 2 2 2 36 081 045400 36081045400 454
## 3 3 3 36 081 045500 36081045500 455
## 4 4 4 36 081 045600 36081045600 456
## 5 5 5 36 081 044602 36081044602 446.02
## 6 6 6 36 081 044800 36081044800 448
## NAMELSAD MTFCC FUNCSTAT ALAND AWATER INTPTLAT INTPTLON
## 1 Census Tract 450 G5020 S 175572 0 +40.7142208 -073.8047869
## 2 Census Tract 454 G5020 S 230996 0 +40.7126504 -073.7960120
## 3 Census Tract 455 G5020 S 148238 0 +40.7363109 -073.8623266
## 4 Census Tract 456 G5020 S 162232 0 +40.7166298 -073.7939898
## 5 Census Tract 446.02 G5020 S 170076 0 +40.7081869 -073.7998476
## 6 Census Tract 448 G5020 S 208002 0 +40.7110219 -073.8026344
## OBJECTID GEO_ID NAME_1
## 1 1797 1400000US36081045000 Census Tract 450; Queens County; New York
## 2 1799 1400000US36081045400 Census Tract 454; Queens County; New York
## 3 1800 1400000US36081045500 Census Tract 455; Queens County; New York
## 4 1801 1400000US36081045600 Census Tract 456; Queens County; New York
## 5 1795 1400000US36081044602 Census Tract 446.02; Queens County; New York
## 6 1796 1400000US36081044800 Census Tract 448; Queens County; New York
## STATE_FIPS COUNTY_FIPS TRACT_FIPS FIPS_ID DP05_0033E DP05_0037E
## 1 36 81 45000 36081045000 2004 316
## 2 36 81 45400 36081045400 4793 112
## 3 36 81 45500 36081045500 13869 3226
## 4 36 81 45600 36081045600 1244 148
## 5 36 81 44602 36081044602 5210 684
## 6 36 81 44800 36081044800 2889 255
## DP05_0038E DP05_0039E DP05_0044E DP05_0052E DP05_0057E DP05_0058E DP05_0072E
## 1 96 0 1363 0 27 202 2004
## 2 487 0 3463 0 209 522 4793
## 3 5273 0 1999 0 1701 1670 13869
## 4 7 0 892 0 80 117 1244
## 5 1016 11 2169 26 985 319 5210
## 6 210 47 1491 0 711 175 2889
## DP05_0073E DP05_0074E DP05_0075E DP05_0076E DP05_0077E DP05_0078E DP05_0079E
## 1 52 0 0 0 52 1952 310
## 2 252 105 45 2 100 4541 74
## 3 4811 901 393 31 3486 9058 1758
## 4 156 0 18 0 138 1088 41
## 5 1473 102 207 0 1164 3737 380
## 6 785 25 30 0 730 2104 228
## DP05_0080E DP05_0081E DP05_0082E DP05_0083E DP05_0084E DP05_0085E
## 1 96 0 1363 0 20 163
## 2 482 0 3463 0 28 494
## 3 4971 0 1999 0 99 231
## 4 7 0 892 0 73 75
## 5 980 11 2169 26 81 90
## 6 210 0 1465 0 103 98
## Shape_Length Shape_Area ecmb DP05_3958E DP05_0037P DP05_0038P DP05_0044P
## 1 1723.874 175571.9 229 229 0.15768463 0.04790419 0.6801397
## 2 2603.629 230994.9 731 731 0.02336741 0.10160651 0.7225120
## 3 1597.919 148237.2 3371 3371 0.23260509 0.38020045 0.1441344
## 4 2350.727 162232.0 197 197 0.11897106 0.00562701 0.7170418
## 5 1689.673 170076.2 1341 1341 0.13128599 0.19500960 0.4163148
## 6 2140.134 208000.4 933 933 0.08826584 0.07268951 0.5160955
## DP05_3958P DP05_3837R DP05_0077P DP05_0075P DP05_0076P DP05_0074P DP05_3937R
## 1 0.1142715 0.3037975 100.00000 0.000000 0.0000000 0.000000 4.3132911
## 2 0.1525141 4.3482143 39.68254 17.857143 0.7936508 41.666667 30.9196429
## 3 0.2430601 1.6345319 72.45895 8.168780 0.6443567 18.727915 0.6196528
## 4 0.1583601 0.0472973 88.46154 11.538462 0.0000000 0.000000 6.0270270
## 5 0.2573896 1.4853801 79.02240 14.052953 0.0000000 6.924644 3.1710526
## 6 0.3229491 0.8235294 92.99363 3.821656 0.0000000 3.184713 5.8470588
## DP05_395837R DP05_7475R DP05_7675R DP05_7775R ALAND_SQMI DP05_0037D
## 1 0.7246835 0.0000000 0.00000000 Inf 0.006297779 50176.42
## 2 6.5267857 2.3333333 0.04444444 2.222222 0.008285841 13517.03
## 3 1.0449473 2.2926209 0.07888041 8.870229 0.005317307 606698.14
## 4 1.3310811 0.0000000 0.00000000 7.666667 0.005819272 25432.73
## 5 1.9605263 0.4927536 0.00000000 5.623188 0.006100637 112119.44
## 6 3.6588235 0.8333333 0.00000000 24.333333 0.007461045 34177.52
## DP05_0038D DP05_0044D DP05_3958D DP05_0074D DP05_0075D DP05_0076D DP05_0077D
## 1 15243.47 216425.5 36362.03 0.000 0.000 0.0000 8256.879
## 2 58774.96 417941.9 88222.78 12672.219 5430.951 241.3756 12068.780
## 3 991667.48 375942.2 633967.58 169446.690 73909.599 5830.0193 655595.073
## 4 1202.90 153283.8 33853.03 0.000 3093.170 0.0000 23714.305
## 5 166539.98 355536.6 219813.11 16719.565 33930.883 0.0000 190799.746
## 6 28146.19 199838.0 125049.51 3350.737 4020.884 0.0000 97841.521
## DP05_0037LQ DP05_0038LQ DP05_0044LQ DP05_3958LQ MISS_ZIP hot_qntl hot_eqint
## 1 0.42085394 0.20733476 4.6991270 0.4579336 0 0 0
## 2 0.06236668 0.43976447 4.9918796 0.6111878 0 0 0
## 3 0.62081364 1.64555055 0.9958334 0.9740435 0 1 1
## 4 0.31752898 0.02435433 4.9540857 0.6346154 0 0 0
## 5 0.35039703 0.84402360 2.8763443 1.0314681 0 1 0
## 6 0.23557797 0.31460843 3.5657357 1.2941924 0 0 0
## hot_jenks hot_pretty
## 1 0 0
## 2 0 0
## 3 1 0
## 4 0 0
## 5 0 0
## 6 0 0
This section defines the specific population count fields that will be used to construct the Location Quotients. The assignment requires four race and ethnicity groups, so the workflow must make those variables explicit before calculating regional totals or tract-level proportions. This matters because the Location Quotient compares the same group at two scales, tract and region. If the fields are inconsistent or incorrectly selected, the resulting statistic will not match the intended analytical objective. In this New York City tract-level analysis, the four selected groups are White, Black, Asian, and an aggregated Other category.
The code process begins by creating the vector
group_fields, which stores the names of the selected Census
variables. The first three fields correspond to White, Black, and Asian
population counts. The fourth field, DP05_3958E, is
intended to represent an aggregated Other category. The line
group_fields %in% names(nyc_trct) then checks whether each
of those fields actually exists in the dataset. This is important
because if one of them is missing, later calculations for totals,
proportions, and LQs will fail.
group_fields <- c(
"DP05_0037E", # White
"DP05_0038E", # Black
"DP05_0044E", # Asian
"DP05_3958E" # Other / aggregated
)
# Check whether fields exist
group_fields %in% names(nyc_trct)
## [1] TRUE TRUE TRUE TRUE
This section shows how the script could create
DP05_3958E if it were not already present
by summing AIAN, NHPI, Some Other Race, and Two or More Races. That
optional process would use arithmetic addition across those component
fields to construct a single aggregated category. Even though that block
is currently commented out, it shows that the workflow anticipates a
case where a needed analytical field must be derived from existing
Census columns rather than simply read from the source dataset.
# If the aggregated "Other" field is not already present, build it
#if (!"DP05_3958E" %in% names(nyc_trct)) {
#nyc_trct$DP05_3958E <- nyc_trct$DP05_0039E +
#nyc_trct$DP05_0052E +
#nyc_trct$DP05_0057E +
#nyc_trct$DP05_0058E
#}
# Recheck after creating it if needed
#group_fields %in% names(nyc_trct)
#summary(nyc_trct[, group_fields])
This section confirms that the dataset contains a valid total population field, which is necessary because all proportion calculations depend on it. In the Location Quotient formula, the tract-level proportion for each group is calculated by dividing that group’s count by total tract population, and the regional proportion is calculated by dividing the group’s citywide total by the citywide population. This means that the total population field is one of the most important denominators in the workflow. If it is missing, zero inappropriately, or otherwise invalid, then both the numerator and denominator pieces of the LQ would be compromised.
The code process begins by assigning the field name
DP05_0033E to total_field, which makes the
code easier to reuse and interpret. The expression
total_field %in% names(nyc_trct) checks whether that
variable exists in the dataset. The command
summary(nyc_trct[[total_field]]) then prints summary
statistics for the total population field, allowing you to inspect the
range and central tendency of tract population values. Finally,
sum(nyc_trct[[total_field]] == 0, na.rm = TRUE) counts how
many tracts have total population equal to zero. This is an important
diagnostic because zero-population tracts can create division problems
later when tract proportions are calculated.
total_field <- "DP05_0033E"
total_field %in% names(nyc_trct)
## [1] TRUE
summary(nyc_trct[[total_field]])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 2306 3448 3712 4876 15945
# Optional diagnostic
sum(nyc_trct[[total_field]] == 0, na.rm = TRUE)
## [1] 81
This section performs a full validation of the variables needed for the LQ analysis before the actual statistic is calculated. In the Basic Statistics assignment, this type of stage was important because calculations should not be trusted unless the fields being used have first been checked for completeness, correct type, and impossible values. That same logic is even more important here because the Location Quotient is built from multiple stages of division. If the underlying fields contain missing values, character data, negative counts, or duplicate tract identifiers, then the LQ output could become misleading or fail entirely. This is therefore a quality-control section that protects the analytical integrity of the workflow.
The code process starts by creating needed_vars, which
is the full list of variables required for the analysis, including the
tract identifier, tract name, total population, the main race fields,
and the aggregated Other field. The function
setdiff(needed_vars, names(nyc_trct)) compares the required
variables to the dataset’s actual column names and returns anything
missing. The if (length(missing) > 0) { stop(...) }
block then halts the script if required columns are absent. This is an
intentional hard stop that prevents the workflow from continuing with
incomplete input data. Next, the script creates na_counts
and uses a for loop with sum(is.na(...)) to
count missing values in each required variable. This gives a
field-by-field diagnostic of missingness. After that,
numeric_vars isolates the fields that must be numeric for
the calculations to work correctly. A second setdiff()
checks whether those expected numeric fields actually exist. The
following for loop then converts each numeric variable
using as.numeric(), while suppressWarnings()
prevents unnecessary clutter from conversion messages. The script
records before_na and after_na for each field
so that if conversion produces additional missing values, it prints a
note. This is a subtle but important diagnostic because it can reveal
fields that contained non-numeric characters. Finally,
sapply(..., min, na.rm = TRUE) checks the minimum value in
each numeric variable after dropping geometry, which gives a quick
screen for impossible negative values.
needed_vars <- c(
"GEOID",
"NAME",
"DP05_0033E",
"DP05_0037E",
"DP05_0038E",
"DP05_0044E",
"DP05_0039E",
"DP05_0052E",
"DP05_0057E",
"DP05_0058E",
"DP05_3958E"
)
missing <- setdiff(needed_vars, names(nyc_trct))
if (length(missing) > 0) {
stop("Missing required columns: ", paste(missing, collapse = ", "))
}
na_counts <- c()
for (v in needed_vars) {
na_counts[v] <- sum(is.na(nyc_trct[[v]]))
}
na_counts
## GEOID NAME DP05_0033E DP05_0037E DP05_0038E DP05_0044E DP05_0039E
## 0 0 0 0 0 0 0
## DP05_0052E DP05_0057E DP05_0058E DP05_3958E
## 0 0 0 0
numeric_vars <- c(
"DP05_0033E",
"DP05_0037E",
"DP05_0038E",
"DP05_0044E",
"DP05_0039E",
"DP05_0052E",
"DP05_0057E",
"DP05_0058E",
"DP05_3958E"
)
setdiff(numeric_vars, names(nyc_trct))
## character(0)
for (v in numeric_vars) {
before_na <- sum(is.na(nyc_trct[[v]]))
nyc_trct[[v]] <- suppressWarnings(as.numeric(nyc_trct[[v]]))
after_na <- sum(is.na(nyc_trct[[v]]))
if (after_na > before_na) {
cat("NOTE: Converting", v, "to numeric increased NAs from", before_na, "to", after_na, "\n")
}
}
# Quick minimum check for impossible negative values
sapply(
st_drop_geometry(nyc_trct)[, numeric_vars],
min,
na.rm = TRUE
)
## DP05_0033E DP05_0037E DP05_0038E DP05_0044E DP05_0039E DP05_0052E DP05_0057E
## 0 0 0 0 0 0 0
## DP05_0058E DP05_3958E
## 0 0
Thischunk deepens the validity check by explicitly identifying
negative values field by field. The which() function is
used with logical conditions to locate rows where population values are
below zero but not missing. Each group is checked separately, and if any
negative values are found, the script uses stop() to halt
the workflow with a specific message naming the problem field. This is
more informative than a general error because it directly identifies the
part of the data that needs correction. At the end of the chunk,
duplicated(nyc_trct$GEOID) checks whether any tract
identifiers occur more than once. Those duplicated GEOIDs are stored in
dups, and if any are found, the script prints a warning and
previews the first few duplicates. This does not necessarily stop the
workflow, but it alerts you that the one-tract-one-record assumption may
have been violated.
bad_pop <- which(!is.na(nyc_trct$DP05_0033E) & nyc_trct$DP05_0033E < 0)
if (length(bad_pop) > 0) stop("Negative total population found.")
bad_white <- which(!is.na(nyc_trct$DP05_0037E) & nyc_trct$DP05_0037E < 0)
if (length(bad_white) > 0) stop("Negative White population found.")
bad_black <- which(!is.na(nyc_trct$DP05_0038E) & nyc_trct$DP05_0038E < 0)
if (length(bad_black) > 0) stop("Negative Black population found.")
bad_asian <- which(!is.na(nyc_trct$DP05_0044E) & nyc_trct$DP05_0044E < 0)
if (length(bad_asian) > 0) stop("Negative Asian population found.")
bad_aian <- which(!is.na(nyc_trct$DP05_0039E) & nyc_trct$DP05_0039E < 0)
if (length(bad_aian) > 0) stop("Negative AIAN population found.")
bad_nhpi <- which(!is.na(nyc_trct$DP05_0052E) & nyc_trct$DP05_0052E < 0)
if (length(bad_nhpi) > 0) stop("Negative NHPI population found.")
bad_other <- which(!is.na(nyc_trct$DP05_0057E) & nyc_trct$DP05_0057E < 0)
if (length(bad_other) > 0) stop("Negative Some Other Race population found.")
bad_two_more <- which(!is.na(nyc_trct$DP05_0058E) & nyc_trct$DP05_0058E < 0)
if (length(bad_two_more) > 0) stop("Negative Two or More Races population found.")
bad_other_agg <- which(!is.na(nyc_trct$DP05_3958E) & nyc_trct$DP05_3958E < 0)
if (length(bad_other_agg) > 0) stop("Negative aggregated Other population found.")
dups <- nyc_trct$GEOID[duplicated(nyc_trct$GEOID)]
if (length(dups) > 0) {
cat("WARNING: Duplicate GEOID values found (first few):\n")
print(head(dups, 10))
}
This section calculates the citywide totals needed to establish the regional comparison baseline for the Location Quotient. The LQ is a relative concentration measure, so it does not evaluate a tract in isolation. Instead, it compares a tract’s internal composition to the overall composition of the study area, which here is New York City. That means the workflow first needs to know the total population of the city and the citywide total count of each analyzed group. These values create the regional context against which each tract will later be compared.
The code process uses sum(..., na.rm = TRUE) to
aggregate total population and each of the four group counts across all
tracts. The object region_total_pop stores the total city
population, while the other objects store the citywide totals for White,
Black, Asian, and Other populations. The na.rm = TRUE
argument ensures that missing values are ignored rather than causing the
total to become missing itself. After these totals are computed,
printing each one allows you to inspect the raw regional counts. This is
a useful diagnostic because it lets you verify that the numbers appear
plausible before moving on to proportion calculations.
region_total_pop <- sum(nyc_trct$DP05_0033E, na.rm = TRUE)
region_total_white <- sum(nyc_trct$DP05_0037E, na.rm = TRUE)
region_total_black <- sum(nyc_trct$DP05_0038E, na.rm = TRUE)
region_total_asian <- sum(nyc_trct$DP05_0044E, na.rm = TRUE)
region_total_other <- sum(nyc_trct$DP05_3958E, na.rm = TRUE)
region_total_pop
## [1] 8622467
region_total_white
## [1] 3230647
region_total_black
## [1] 1992200
region_total_asian
## [1] 1247994
region_total_other
## [1] 2151626
This section converts the regional totals into proportions, which form the denominator of the Location Quotient formula. A regional proportion expresses the share of the total city population that belongs to a given group. This matters because the Location Quotient is not based on raw counts. It compares a tract’s proportion for a group to the city’s overall proportion for that same group. In other words, the regional proportions establish what the average distribution looks like across New York City, and each tract is evaluated relative to that baseline.
The code process begins by defining safe_div, a custom
function designed to perform division while protecting against invalid
results. The function first creates an output vector filled with
NA_real_, which ensures the output is numeric. It then
checks whether the denominator is a single number or a vector. In either
case, it uses logical conditions to identify valid positions where the
numerator and denominator are both not missing and where the denominator
is greater than zero. Only in those valid positions does it perform the
division. This means the function prevents division by zero and avoids
propagating missing values inappropriately. After the function is
defined, the script calculates region_prop_white,
region_prop_black, region_prop_asian, and
region_prop_other by dividing each regional group total by
the regional total population. Printing these values allows you to
confirm that the resulting proportions are sensible and that the
denominator side of the LQ is ready for use.
safe_div <- function(num, den) {
out <- rep(NA_real_, length(num))
if (length(den) == 1) {
ok <- !is.na(num) & !is.na(den) & den > 0
out[ok] <- num[ok] / den
} else {
ok <- !is.na(num) & !is.na(den) & den > 0
out[ok] <- num[ok] / den[ok]
}
out
}
region_prop_white <- safe_div(region_total_white, region_total_pop)
region_prop_black <- safe_div(region_total_black, region_total_pop)
region_prop_asian <- safe_div(region_total_asian, region_total_pop)
region_prop_other <- safe_div(region_total_other, region_total_pop)
region_prop_white
## [1] 0.3746778
region_prop_black
## [1] 0.2310476
region_prop_asian
## [1] 0.1447375
region_prop_other
## [1] 0.2495372
ChatGPT Assitance
Prompt = my .cvs has come back with nulls for all LQ, where is the issue?
The issue in your original workflow came from how the safe_div() function was designed versus how it was being used in the Location Quotient calculation. In your first script, safe_div() was written to handle element-wise division between two vectors of equal length, where each numerator value is paired with a corresponding denominator value, which works correctly for tract-level proportion and density calculations (numerator and denominator both vary by tract) . However, in the LQ step, you introduced a different structure: the numerator was a full vector of tract-level proportions, while the denominator was a single regional proportion value. Because the function still tried to apply vector-style indexing (den[ok]), it only correctly evaluated the first matching case and left the rest as NA, which propagated through the entire LQ output. The revised version works because it explicitly handles this mismatch in scale by recognizing when the denominator is a single value and applying it consistently across all valid numerator entries. This aligns the computation with the actual LQ formula, where a constant regional proportion is used to standardize tract-level values . In short, the original function failed due to a mismatch between vector-based logic and scalar input, and the fix works because it correctly separates those two cases and applies the appropriate division method.
This section calculates the tract-level proportions for each group, which form the numerator of the Location Quotient. While the regional proportions describe the overall demographic structure of New York City, the tract-level proportions describe the internal composition of each individual tract. This is what allows the workflow to compare local demographic structure to the broader citywide baseline. A proportion is necessary here because raw counts alone would not be comparable across tracts of very different sizes. Standardizing group counts by tract population makes the local values analytically comparable.
The code process uses the safe_div function again, but
this time it divides each tract’s group count by that same tract’s total
population. The results are written back into new fields in
nyc_trct, named TRACT_PROP_WHITE,
TRACT_PROP_BLACK, TRACT_PROP_ASIAN, and
TRACT_PROP_OTHER. These are now stored directly in the
dataset as derived variables. The subsequent summary() call
then prints summary statistics for these four new proportion fields.
That output helps verify that the values are within reasonable bounds,
usually between zero and one, and that the new fields were successfully
created.
nyc_trct$TRACT_PROP_WHITE <- safe_div(nyc_trct$DP05_0037E, nyc_trct$DP05_0033E)
nyc_trct$TRACT_PROP_BLACK <- safe_div(nyc_trct$DP05_0038E, nyc_trct$DP05_0033E)
nyc_trct$TRACT_PROP_ASIAN <- safe_div(nyc_trct$DP05_0044E, nyc_trct$DP05_0033E)
nyc_trct$TRACT_PROP_OTHER <- safe_div(nyc_trct$DP05_3958E, nyc_trct$DP05_0033E)
summary(nyc_trct[, c(
"TRACT_PROP_WHITE",
"TRACT_PROP_BLACK",
"TRACT_PROP_ASIAN",
"TRACT_PROP_OTHER"
)])
## TRACT_PROP_WHITE TRACT_PROP_BLACK TRACT_PROP_ASIAN TRACT_PROP_OTHER
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.1222 1st Qu.:0.02158 1st Qu.:0.02674 1st Qu.:0.1056
## Median :0.3180 Median :0.10013 Median :0.07944 Median :0.1866
## Mean :0.3718 Mean :0.24031 Mean :0.15036 Mean :0.2376
## 3rd Qu.:0.6096 3rd Qu.:0.41090 3rd Qu.:0.21546 3rd Qu.:0.3378
## Max. :1.0000 Max. :1.00000 Max. :0.92321 Max. :0.9104
## NA's :81 NA's :81 NA's :81 NA's :81
## geom
## MULTIPOLYGON :2323
## epsg:NA : 0
## +proj=aea ...: 0
##
##
##
##
This section performs the central analytical calculation in the workflow by constructing the Location Quotients themselves. Up to this point, the code has built the two necessary components of the statistic: the tract-level proportions and the regional proportions. The Location Quotient is then produced by dividing the first by the second. Conceptually, this step asks whether a given group is more concentrated or less concentrated in a tract than would be expected based on that group’s overall share of the city population. This is why LQ is useful in geographic analysis. It does not merely show where a group is present, but where it is relatively overrepresented or underrepresented compared to the study area baseline.
The code process again uses safe_div, this time dividing
each tract proportion field by the corresponding regional proportion.
The results are written into four new fields: LQ_WHITE,
LQ_BLACK, LQ_ASIAN, and LQ_OTHER.
Because the regional proportions are each a single citywide value, the
function applies that one denominator across the full vector of
tract-level proportions. The result is a tract-level ratio for each
group. A value above 1 indicates relative concentration above the city
average, a value near 1 indicates that the tract is close to the
citywide pattern, and a value below 1 indicates relative
under-representation.
nyc_trct$LQ_WHITE <- safe_div(nyc_trct$TRACT_PROP_WHITE, region_prop_white)
nyc_trct$LQ_BLACK <- safe_div(nyc_trct$TRACT_PROP_BLACK, region_prop_black)
nyc_trct$LQ_ASIAN <- safe_div(nyc_trct$TRACT_PROP_ASIAN, region_prop_asian)
nyc_trct$LQ_OTHER <- safe_div(nyc_trct$TRACT_PROP_OTHER, region_prop_other)
This section cleans the newly created LQ fields by identifying and
replacing invalid outputs. Even with a safe division function, it is
still necessary to inspect the final LQ variables for values such as
Inf, NaN, or NA. These values can
appear when denominator conditions are not analytically meaningful, such
as a tract with zero total population or a case where a comparison value
is not defined. This stage is important because the workflow is not only
expected to produce a statistic but also to demonstrate that the output
is interpretable and analytically sound .
The code process checks each LQ field with is.infinite()
and is.nan(). Wherever those conditions are true, the
script replaces the result with NA. This preserves the fact
that the value is not analytically interpretable without leaving behind
misleading infinite or undefined values. After cleaning, the script uses
sum(is.na(...)) to count how many missing values remain in
each LQ field and sum(is.infinite(...), na.rm = TRUE) to
confirm that no infinite values remain. These checks provide direct
evidence that the output has been cleaned and that invalid results have
been managed consistently.
nyc_trct$LQ_WHITE[is.infinite(nyc_trct$LQ_WHITE) | is.nan(nyc_trct$LQ_WHITE)] <- NA
nyc_trct$LQ_BLACK[is.infinite(nyc_trct$LQ_BLACK) | is.nan(nyc_trct$LQ_BLACK)] <- NA
nyc_trct$LQ_ASIAN[is.infinite(nyc_trct$LQ_ASIAN) | is.nan(nyc_trct$LQ_ASIAN)] <- NA
nyc_trct$LQ_OTHER[is.infinite(nyc_trct$LQ_OTHER) | is.nan(nyc_trct$LQ_OTHER)] <- NA
sum(is.na(nyc_trct$LQ_WHITE))
## [1] 81
sum(is.na(nyc_trct$LQ_BLACK))
## [1] 81
sum(is.na(nyc_trct$LQ_ASIAN))
## [1] 81
sum(is.na(nyc_trct$LQ_OTHER))
## [1] 81
sum(is.infinite(nyc_trct$LQ_WHITE), na.rm = TRUE)
## [1] 0
sum(is.infinite(nyc_trct$LQ_BLACK), na.rm = TRUE)
## [1] 0
sum(is.infinite(nyc_trct$LQ_ASIAN), na.rm = TRUE)
## [1] 0
sum(is.infinite(nyc_trct$LQ_OTHER), na.rm = TRUE)
## [1] 0
This section confirms that the newly created analytic variables exist and are named clearly. Clear variable naming matters because the final output should be readable and interpreted outside the immediate context of code execution. In the Basic Statistics assignment, naming also mattered because the report needed to show how derived variables connected to the analytic objective. The same logic applies here. The tract-level proportions and the final LQ values should be easy to distinguish so that the reader understands which variables are intermediate measures and which are the primary analytic output.
The code process uses names(nyc_trct) to retrieve all
column names and then filters those names to display only the newly
created proportion and LQ fields. This does not create the fields, since
they were already created in earlier steps. Instead, it verifies that
they exist in the dataset and confirms that their names are consistent
and interpretative. This is a final naming check before validation and
export.
names(nyc_trct)[names(nyc_trct) %in% c(
"TRACT_PROP_WHITE",
"TRACT_PROP_BLACK",
"TRACT_PROP_ASIAN",
"TRACT_PROP_OTHER",
"LQ_WHITE",
"LQ_BLACK",
"LQ_ASIAN",
"LQ_OTHER"
)]
## [1] "TRACT_PROP_WHITE" "TRACT_PROP_BLACK" "TRACT_PROP_ASIAN" "TRACT_PROP_OTHER"
## [5] "LQ_WHITE" "LQ_BLACK" "LQ_ASIAN" "LQ_OTHER"
This section evaluates the LQ outputs to determine whether they behave as expected analytically. The assignment emphasizes that the work is not evaluated only on whether the code runs, but on whether the analysis is correctly constructed and whether the results are ready for analysis. That means the LQ fields need to be reviewed critically before export. In geographic analysis, unusual values may be meaningful, but they may also indicate problems such as denominator errors, very small tract populations, or unintended data issues. This stage therefore functions as a statistical and analytic reasonableness check.
The code process begins with summary() on each LQ field,
which reports the minimum, quartiles, median, mean, and maximum. These
summaries help you see whether the fields have a sensible central
tendency and range. The mean(..., na.rm = TRUE) calls then
calculate the average LQ value for each group, which can be used as a
broad diagnostic. The sd(..., na.rm = TRUE) calls calculate
the standard deviation, which indicates how much variation exists across
tracts. Finally, hist() is used to plot the distribution of
each LQ field. These histograms help reveal skewness, clustering, long
tails, or unusual spikes in the distribution. Together, these functions
move the workflow beyond simple code execution and into interpretation,
allowing you to judge whether the outputs are reasonable before the
final export.
summary(nyc_trct$LQ_WHITE)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.3260 0.8487 0.9922 1.6271 2.6690 81
summary(nyc_trct$LQ_BLACK)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00000 0.09339 0.43338 1.04010 1.77844 4.32811 81
summary(nyc_trct$LQ_ASIAN)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.1847 0.5489 1.0389 1.4886 6.3785 81
summary(nyc_trct$LQ_OTHER)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.4230 0.7479 0.9520 1.3536 3.6485 81
mean(nyc_trct$LQ_WHITE, na.rm = TRUE)
## [1] 0.9921987
mean(nyc_trct$LQ_BLACK, na.rm = TRUE)
## [1] 1.0401
mean(nyc_trct$LQ_ASIAN, na.rm = TRUE)
## [1] 1.038878
mean(nyc_trct$LQ_OTHER, na.rm = TRUE)
## [1] 0.9520345
sd(nyc_trct$LQ_WHITE, na.rm = TRUE)
## [1] 0.7371925
sd(nyc_trct$LQ_BLACK, na.rm = TRUE)
## [1] 1.199037
sd(nyc_trct$LQ_ASIAN, na.rm = TRUE)
## [1] 1.201467
sd(nyc_trct$LQ_OTHER, na.rm = TRUE)
## [1] 0.6941084
hist(nyc_trct$LQ_WHITE, breaks = 50, main = "LQ White", xlab = "Location Quotient")
hist(nyc_trct$LQ_BLACK, breaks = 50, main = "LQ Black", xlab = "Location Quotient")
hist(nyc_trct$LQ_ASIAN, breaks = 50, main = "LQ Asian", xlab = "Location Quotient")
hist(nyc_trct$LQ_OTHER, breaks = 50, main = "LQ Other", xlab = "Location Quotient")
This section creates the final non-spatial output table that will be written to CSV. At this stage, the workflow has already completed the main analytic calculations, so the next task is to organize the results into a clean and usable dataset. The assignment requires a table that can be used for later mapping or downstream analysis, which means the export should contain the tract identifier and the analytic variables that were created through the workflow . This section therefore acts as the final data-shaping stage.
The code process uses the base pipe to send nyc_trct
into st_drop_geometry(), removing the geometry so that the
final object is a plain table. It then uses select() from
dplyr to choose the specific fields to keep. Those fields
include GEOID, NAME, the total population
field, the tract-level proportion fields, and the LQ fields. The
selected data are stored in lq_table, which becomes the
final export-ready object. The call to head(lq_table) then
previews the first rows so you can verify that the right columns were
retained and that the table is structured as intended.
lq_table <- nyc_trct |>
st_drop_geometry() |>
select(
GEOID,
tract_ID = NAME,
LQ_WHITE,
LQ_BLACK,
LQ_ASIAN,
LQ_OTHER
)
head(lq_table)
## GEOID tract_ID LQ_WHITE LQ_BLACK LQ_ASIAN LQ_OTHER
## 1 36081045000 450 0.42085394 0.20733476 4.6991270 0.4579336
## 2 36081045400 454 0.06236668 0.43976447 4.9918796 0.6111878
## 3 36081045500 455 0.62081364 1.64555055 0.9958334 0.9740435
## 4 36081045600 456 0.31752898 0.02435433 4.9540857 0.6346154
## 5 36081044602 446.02 0.35039703 0.84402360 2.8763443 1.0314681
## 6 36081044800 448 0.23557797 0.31460843 3.5657357 1.2941924
tail(lq_table)
## GEOID tract_ID LQ_WHITE LQ_BLACK LQ_ASIAN LQ_OTHER
## 2318 36081040400 404 0.05951975 3.20523106 0.8237786 0.4725020
## 2319 36081041100 411 0.68491748 0.05620926 1.0534075 2.3159759
## 2320 36081041400 414 0.09030315 3.12662071 0.5577556 0.6533663
## 2321 36081041500 415 0.65231567 0.32269250 0.9449252 2.1811116
## 2322 36081042700 427 0.32361726 0.43891800 0.4950275 2.8279868
## 2323 36081043400 434 0.13176434 3.42830246 0.3297248 0.4440475
This section writes the final table to an external CSV file, completing the workflow from data import to validated output. The CSV format is appropriate because it creates a clean non-spatial table that can be opened outside of R, shared easily, and used in later mapping or analysis steps. This is the final deliverable stage of the assignment, where the tract-level LQ dataset becomes a usable product rather than just an object stored in the R session .
The code process begins with
file.path(getwd(), "nyc_tract_location_quotients.csv"),
which creates the full output path in the current working directory. The
write.csv() function then writes lq_table to
that location with row.names = FALSE, which prevents R from
adding an extra row-number column to the file. Finally, the
cat() function prints a confirmation message and the output
path so you can verify that the export completed successfully. This
closes the analytical workflow by producing the final tract-level
dataset required for the assignment.
student_green_assignment_five_data.csv is attached in
the zipfile.
csv_path <- file.path(getwd(), "student_green_assignment_five_data.csv")
write.csv(
lq_table,
csv_path,
row.names = FALSE
)
cat("\nDONE. Wrote CSV to:\n", csv_path, "\n")
##
## DONE. Wrote CSV to:
## D:/2026_GES_666_Critical_Maps/Lab_4/student_green_assignment_five_data.csv
This workflow moves from raw tract-level Census data to a structured and interpretative dataset of Location Quotients for New York City. By constructing the LQ through clearly defined components, specifically tract-level proportions as the numerator and regional proportions as the denominator, the workflow ensures that each value reflects a meaningful comparison between local conditions and the broader population structure. This distinction is critical, as the analysis is not simply describing where populations are located, but evaluating how their distribution differs across space relative to the city as a whole.
The results produced through this workflow are analytically useful because they reveal patterns of relative concentration that are not visible through raw counts or basic proportions alone. This is particularly important in the analysis of Black residents in New York City. While composition maps or proportion-based measures show the share of Black residents within each tract, they do not indicate whether that share is high or low relative to the citywide baseline. The Location Quotient adds this missing layer of interpretation by identifying where Black residents are relatively over-represented or underrepresented compared to the overall population structure. This allows for a clearer understanding of spatial concentration, clustering, and uneven distribution, which are central to interpreting patterns shaped by historical processes such as segregation, housing policy, and neighborhood change.
Using the LQ for Black residents also extends the analysis beyond what was possible in the Basic Statistics assignment. Composition and consistency measures describe internal tract characteristics, but they do not account for how those characteristics compare across the entire city. The LQ introduces a relational perspective, allowing each tract to be evaluated within the broader geographic system. This creates opportunities for deeper analysis, such as identifying clusters of high concentration, comparing LQ values across boroughs, or examining how areas of over-representation align with other variables such as income, housing conditions, or investment patterns. It also allows for temporal comparison if applied across multiple years, making it possible to track changes in concentration over time.
The inclusion of validation steps ensures that the results are not only computed correctly but are also reasonable. Summary statistics and distribution checks confirm that the LQ values behave as expected and that no extreme or invalid results are driving the analysis. Overall, the workflow satisfies the requirements of the assignment by demonstrating a clear analytic objective, a structured sequence of data processing steps, and a final output that is both clean and usable. The use of AI-assisted coding supports the construction of the workflow, but the analytic responsibility remains in defining the problem, validating the data, and interpreting the results. In this context, the Location Quotient serves as a critical tool for moving from descriptive mapping to meaningful geographic analysis, allowing the spatial distribution of Black residents in New York City to be understood in relation to the broader urban system.