LAB:AI-Assisted Analytic Workflow – Location Quotients by Census Tract

Introduction

Spatial analysis in geography is not only concerned with identifying patterns, but with explaining how those patterns differ across space relative to a broader context. In urban environments such as New York City, population distributions are highly uneven, reflecting long-standing processes of segregation, migration, and economic restructuring. Simply mapping counts or proportions of demographic groups does not fully capture these spatial relationships. What is required is a method that allows comparison between local conditions and the broader regional structure.

The Location Quotient (LQ) provides this analytic capability. Rather than describing how many individuals of a given group live in a census tract, LQ measures whether that group is more or less concentrated in that tract relative to the overall population distribution of the city. This allows for the identification of areas of over-representation and under-representation, making it a powerful tool for understanding spatial inequality, clustering, and demographic structure.

In the context of New York City, this is particularly important. The city is composed of diverse neighborhoods with distinct demographic profiles, and these patterns are shaped by historical processes such as redlining, housing policy, and migration flows. By applying Location Quotients to tract-level Census data, this analysis moves beyond descriptive mapping and toward a comparative spatial framework that reveals how demographic groups are distributed relative to the city as a whole.

This analysis was conducted through the aid of ChatGPT as to assist in,

Analytic Objective

Using tract-level Census data for New York City, this analysis measures the relative concentration of four race and ethnicity groups by calculating Location Quotients across census tracts, in order to identify where groups are over- or under-represented relative to the citywide population distribution.

This objective reflects the structure of the LQ statistic, which compares tract-level composition (numerator) to regional composition (denominator), producing a ratio that is directly interpretable in spatial terms

Workflow Overview

This analysis follows a structured analytic pipeline, similar to the Basic Statistics assignment, but extended to include multi-scale comparison and ratio-based analysis.

Workflow Steps

  1. Establish working environment and load required libraries

  2. Read and inspect the tract-level data-set from the GeoPackage

  3. Validate data structure and confirm required fields

  4. Define total population and race/ethnicity group variables

  5. Calculate regional totals and proportions

  6. Calculate tract-level proportions

  7. Construct Location Quotients using standardized formula

  8. Validate results through summary statistics and inspection

  9. Prepare a clean output dataset

  10. Export results as a .csv file

This workflow reflects the assignment requirement to move from data access to verified analytic output, while demonstrating understanding of each component of the statistic.

Understanding Location Quotients (LQ)

Before implementing the code, it is necessary to clearly define the statistic being constructed.

The Location Quotient is calculated as:

LQ=Tract-Level ProportionRegional ProportionLQ = \frac{\text{Tract-Level Proportion}}{\text{Regional Proportion}}LQ=Regional ProportionTract-Level Proportion​

Where:

Interpretation

  • LQ > 1 → the group is more concentrated in the tract than expected

  • LQ = 1 → the tract reflects the regional average

  • LQ < 1 → the group is underrepresented

Why This Matters in Geography

LQ is widely used in geographic analysis because it introduces relational context. Instead of asking “how many,” it asks:

  • How does this location compare to the broader system?

  • Where are concentrations forming?

  • Where are groups relatively absent?

This makes it especially useful for:

  • Identifying segregation patterns.

  • Detecting clustering of populations.

  • Comparing neighborhoods within cities.

  • Supporting policy and planning decisions.

In this assessment, LQ transforms tract-level Census data into a comparative spatial measure that reveals how demographic patterns in NYC relate to the overall structure of the city.

Load Libraries

This section loads the packages required to complete the Location Quotient workflow in a reproducible way. These libraries establish the analytical environment before any data are read or transformed. The workflow depends on spatial data being opened correctly, population fields being manipulated reliably, and the final work being presented in a structured knitted document. In this assignment, these libraries matter because the analysis is not just tabular. It begins with tract polygons stored in a GeoPackage, then moves through field validation, proportional calculations, and final export into a non-spatial table. This means the code needs tools for both geographic data management and standard data handling. As in the Basic Statistics assignment, this section confirms that the required computational environment is available before starting the analysis.

The code process begins with library(sf), which loads the Simple Features package for reading and handling spatial vector data. This is what later allows the tract layer to be imported from the GeoPackage and stored as an sf object with both geometry and attributes (Pebesma, 2018). The library(dplyr) call loads tools for data manipulation, especially selecting columns and shaping the final export table. The library(sp) call adds compatibility with older spatial classes, which can still be useful in geographic workflows. The library(rmarkdown) and library(knitr) calls support reproducible reporting by controlling how the code, results, and narrative text are rendered into the final document. Together, these packages make it possible to build a geographic analytical workflow that is both computationally functional and clearly documented.

library(sf)        # ANALYSIS - Simple Features for R - Handling Geographic Data.
## Warning: package 'sf' was built under R version 4.5.2
library(dplyr)     # ANALYSIS - A Grammar for Data Manipulation.
## Warning: package 'dplyr' was built under R version 4.5.2
library(sp)        # ANALYSIS - Classes and Methods for Spatial Data.
library(rmarkdown) # DELIVERABLE - A Notebook Interface to Produce Elegantly Formatted Output.
library(knitr)     # DELIVERABLE - Elegant, Flexible, and Fast Dynamic Report Generation.

Set Working Environment and File Paths

This section stabilizes the file structure for the workflow by defining the knit root directory and checking that the project folder contains the expected files. This is an important environmental step because the script depends on being able to find the GeoPackage and later write the CSV output to a known location. In a reproducible analysis, file access cannot be left to guesswork. The working environment has to be clearly established so that the code behaves the same way each time it is run. This matters especially in an assignment like this, where the workflow moves through multiple stages and where a simple file path problem could interrupt the entire analytical process before any LQ calculations begin.

The code here starts with knitr::opts_knit$set(), which assigns the root directory used when the document is knitted. This ensures that relative file paths are resolved from the correct folder rather than from some temporary location. The calls to list.dirs() and list.files() then act as environmental checks. list.dirs() prints the folder structure so you can verify the broader project organization, while list.files() prints the files in the working directory so you can confirm that the GeoPackage is actually present. These commands do not modify the data, but they verify that the environment is prepared correctly before the dataset is opened.

knitr::opts_chunk$set(
  echo = TRUE,
  include = TRUE
)

list.dirs()
## [1] "."                                   
## [2] "./26.4.2_R_script_location_quotients"
list.files()
## [1] "26.4.2_R_script_location_quotients"    
## [2] "26.4.2_R_script_location_quotients.zip"
## [3] "labwork_r_script_guide_v1.R"           
## [4] "lq_r_script.html"                      
## [5] "lq_r_script.Rmd"                       
## [6] "new_york_city_metro_2022_class.gpkg"   
## [7] "nyc_tract_location_quotients.csv"      
## [8] "student_green_assignment_five_data.csv"

Tract layer from the Geopackage in R

This section identifies the GeoPackage path, inspects the available layers inside it, and imports the tract layer that will be used for the analysis. This is the point where the workflow moves from setup into active data access. The assignment is specifically concerned with tract-level Census data for New York City, so loading the correct tract layer is critical. If the wrong layer is selected, then the entire analytical workflow would be built on the wrong geography or the wrong variables. Because GeoPackages can contain multiple layers, the script needs to confirm both the file path and the correct layer name before proceeding.

Here we use file.path(getwd(), "new_york_city_metro_2022_class.gpkg") which constructs the full file path to the GeoPackage by combining the working directory with the file name. This is more reliable than hard-coding a full path because it keeps the workflow portable. The st_layers(gpkg_path) command inspects the GeoPackage and prints the names and properties of the layers stored inside it.

gpkg_path <- file.path(getwd(), "new_york_city_metro_2022_class.gpkg")

st_layers(gpkg_path) # OPTIONAL
## Driver: GPKG 
## Available layers:
##                                    layer_name geometry_type features fields
## 1            cb_2022_us_tract510_500k_nyc_v2a Multi Polygon     2323     75
## 2            neighborhood_boundaries_nyc_2019 Multi Polygon      195     11
## 3             neighborhood_locations_nyc_2019         Point      299     21
## 4               cb_2022_us_county510_500k_nyc       Polygon        7     10
## 5    cb_2022_us_tract510_500k_nyc_v2a_updated Multi Polygon     2323     79
## 6 cb_2022_us_tract510_500k_nyc_v2a_updated_V2         Point     2704     32
## 7    cb_2022_us_tracts_500k_nyc_v2a_DBlack_V2         Point     2323     21
## 8   rfin_2023_nyc_investor_purchase_estimates            NA     2881      5
## 9     rfin_2023_nyc_investor_market_estimates            NA     2187      6
##                                crs_name
## 1 North_America_Albers_Equal_Area_Conic
## 2 North_America_Albers_Equal_Area_Conic
## 3 North_America_Albers_Equal_Area_Conic
## 4 North_America_Albers_Equal_Area_Conic
## 5 North_America_Albers_Equal_Area_Conic
## 6 North_America_Albers_Equal_Area_Conic
## 7 North_America_Albers_Equal_Area_Conic
## 8                                  <NA>
## 9                                  <NA>

This is a diagnostic step that helps verify that the tract layer exists and is named correctly the layer_name <- "cb_2022_us_tract510_500k_nyc_v2a_updated" line then stores the exact layer name you want to read. Finally, st_read(dsn = gpkg_path, layer = layer_name, quiet = FALSE) imports that layer into the object nyc_trct. The result is an sf object that contains both the tract boundaries and the attribute table that holds the Census variables needed for the LQ calculations

layer_name <- "cb_2022_us_tract510_500k_nyc_v2a_updated"

nyc_trct <- st_read(
  dsn = gpkg_path,
  layer = layer_name,
  quiet = FALSE
)
## Reading layer `cb_2022_us_tract510_500k_nyc_v2a_updated' from data source 
##   `D:\2026_GES_666_Critical_Maps\Lab_4\new_york_city_metro_2022_class.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 2323 features and 79 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1718547 ymin: 256711 xmax: 1756674 ymax: 311508
## Projected CRS: North_America_Albers_Equal_Area_Conic

Data Inspection

This section inspects the dataset after import to verify that it has the correct structure and contains the variables required for analysis. Before performing any calculations, the script needs to confirm that the data were read correctly, that geometry is present, and that the expected population fields exist in usable formats. This mirrors the logic of the Basic Statistics assignment, where early inspection helps prevent later errors from developing silently. Since Location Quotients depend on accurate tract identifiers, total population, and group counts, this inspection stage is a necessary checkpoint between reading the data and constructing the statistic.

The code process uses head() and tail() to preview different parts of the dataset, allowing you to see sample records without printing the entire table. The str(nyc_trct) command prints the structure of the object, including variable classes, which helps confirm whether demographic fields are numeric or stored incorrectly as character text. The class(nyc_trct) command verifies that the imported dataset is in fact an sf object, while attr(nyc_trct, "sf_column") identifies which column stores the geometry. The st_geometry(nyc_trct) call confirms that spatial geometry is present. The summary(nyc_trct[, 6:11]) command then gives a quick statistical overview of selected fields.

After that, "GEOID" %in% names(nyc_trct) checks whether the tract identifier exists, and the vector test against the race fields checks whether the major race and ethnicity count fields are present.

# Check that the tract identifier exists
"GEOID" %in% names(nyc_trct)
## [1] TRUE

The sapply(..., class) step then checks the current classes of the key variables after dropping geometry, which helps determine whether type conversion will be needed later. None of these commands alter the dataset. Instead, they function as structural diagnostics that confirm the analysis can proceed safely.

# Check current classes of key fields
sapply(
  st_drop_geometry(nyc_trct)[, c(
    "GEOID",
    "DP05_0033E",
    "DP05_0037E",
    "DP05_0038E",
    "DP05_0044E",
    "DP05_0039E",
    "DP05_0052E",
    "DP05_0057E",
    "DP05_0058E"
  )],
  class
)
##      GEOID DP05_0033E DP05_0037E DP05_0038E DP05_0044E DP05_0039E DP05_0052E 
##  "numeric"  "numeric"  "numeric"  "numeric"  "numeric"  "numeric"  "numeric" 
## DP05_0057E DP05_0058E 
##  "numeric"  "numeric"

Drop Geometry

This section creates a non-spatial version of the dataset for later tabular analysis and export. Although the tract boundaries are necessary to define the geographic unit of analysis, the actual LQ calculations rely on population counts and proportions rather than on geometry itself. Keeping the spatial object during the early part of the workflow is useful because it preserves the complete dataset, but later calculations and exports are easier to manage in a standard data frame without the geometry column. This step therefore separates the geographic structure of the data from the tabular operations needed to build the final output.

The code process first uses class(nyc_trct) to confirm that the original dataset remains a spatial object. It then uses st_drop_geometry(nyc_trct) to create nyc_tab, which is a non-spatial copy of the same attribute data. This function removes only the geometry column and leaves the Census fields intact. The class(nyc_tab) call then confirms that the result is now a standard table rather than an sf object, and head(nyc_tab) previews the non-spatial table. This is not yet the final export. Instead, it creates a cleaner working version of the dataset for tabular operations and prepares for the later CSV output.

# Keep geometry during analysis
class(nyc_trct)
## [1] "sf"         "data.frame"
# Create a non-spatial table copy only when needed
nyc_tab <- st_drop_geometry(nyc_trct)

class(nyc_tab)
## [1] "data.frame"
head(nyc_tab)
##   OBJECTID_1 fid_1 STATEFP COUNTYFP TRACTCE       GEOID   NAME
## 1          1     1      36      081  045000 36081045000    450
## 2          2     2      36      081  045400 36081045400    454
## 3          3     3      36      081  045500 36081045500    455
## 4          4     4      36      081  045600 36081045600    456
## 5          5     5      36      081  044602 36081044602 446.02
## 6          6     6      36      081  044800 36081044800    448
##              NAMELSAD MTFCC FUNCSTAT  ALAND AWATER    INTPTLAT     INTPTLON
## 1    Census Tract 450 G5020        S 175572      0 +40.7142208 -073.8047869
## 2    Census Tract 454 G5020        S 230996      0 +40.7126504 -073.7960120
## 3    Census Tract 455 G5020        S 148238      0 +40.7363109 -073.8623266
## 4    Census Tract 456 G5020        S 162232      0 +40.7166298 -073.7939898
## 5 Census Tract 446.02 G5020        S 170076      0 +40.7081869 -073.7998476
## 6    Census Tract 448 G5020        S 208002      0 +40.7110219 -073.8026344
##   OBJECTID               GEO_ID                                       NAME_1
## 1     1797 1400000US36081045000    Census Tract 450; Queens County; New York
## 2     1799 1400000US36081045400    Census Tract 454; Queens County; New York
## 3     1800 1400000US36081045500    Census Tract 455; Queens County; New York
## 4     1801 1400000US36081045600    Census Tract 456; Queens County; New York
## 5     1795 1400000US36081044602 Census Tract 446.02; Queens County; New York
## 6     1796 1400000US36081044800    Census Tract 448; Queens County; New York
##   STATE_FIPS COUNTY_FIPS TRACT_FIPS     FIPS_ID DP05_0033E DP05_0037E
## 1         36          81      45000 36081045000       2004        316
## 2         36          81      45400 36081045400       4793        112
## 3         36          81      45500 36081045500      13869       3226
## 4         36          81      45600 36081045600       1244        148
## 5         36          81      44602 36081044602       5210        684
## 6         36          81      44800 36081044800       2889        255
##   DP05_0038E DP05_0039E DP05_0044E DP05_0052E DP05_0057E DP05_0058E DP05_0072E
## 1         96          0       1363          0         27        202       2004
## 2        487          0       3463          0        209        522       4793
## 3       5273          0       1999          0       1701       1670      13869
## 4          7          0        892          0         80        117       1244
## 5       1016         11       2169         26        985        319       5210
## 6        210         47       1491          0        711        175       2889
##   DP05_0073E DP05_0074E DP05_0075E DP05_0076E DP05_0077E DP05_0078E DP05_0079E
## 1         52          0          0          0         52       1952        310
## 2        252        105         45          2        100       4541         74
## 3       4811        901        393         31       3486       9058       1758
## 4        156          0         18          0        138       1088         41
## 5       1473        102        207          0       1164       3737        380
## 6        785         25         30          0        730       2104        228
##   DP05_0080E DP05_0081E DP05_0082E DP05_0083E DP05_0084E DP05_0085E
## 1         96          0       1363          0         20        163
## 2        482          0       3463          0         28        494
## 3       4971          0       1999          0         99        231
## 4          7          0        892          0         73         75
## 5        980         11       2169         26         81         90
## 6        210          0       1465          0        103         98
##   Shape_Length Shape_Area ecmb DP05_3958E DP05_0037P DP05_0038P DP05_0044P
## 1     1723.874   175571.9  229        229 0.15768463 0.04790419  0.6801397
## 2     2603.629   230994.9  731        731 0.02336741 0.10160651  0.7225120
## 3     1597.919   148237.2 3371       3371 0.23260509 0.38020045  0.1441344
## 4     2350.727   162232.0  197        197 0.11897106 0.00562701  0.7170418
## 5     1689.673   170076.2 1341       1341 0.13128599 0.19500960  0.4163148
## 6     2140.134   208000.4  933        933 0.08826584 0.07268951  0.5160955
##   DP05_3958P DP05_3837R DP05_0077P DP05_0075P DP05_0076P DP05_0074P DP05_3937R
## 1  0.1142715  0.3037975  100.00000   0.000000  0.0000000   0.000000  4.3132911
## 2  0.1525141  4.3482143   39.68254  17.857143  0.7936508  41.666667 30.9196429
## 3  0.2430601  1.6345319   72.45895   8.168780  0.6443567  18.727915  0.6196528
## 4  0.1583601  0.0472973   88.46154  11.538462  0.0000000   0.000000  6.0270270
## 5  0.2573896  1.4853801   79.02240  14.052953  0.0000000   6.924644  3.1710526
## 6  0.3229491  0.8235294   92.99363   3.821656  0.0000000   3.184713  5.8470588
##   DP05_395837R DP05_7475R DP05_7675R DP05_7775R  ALAND_SQMI DP05_0037D
## 1    0.7246835  0.0000000 0.00000000        Inf 0.006297779   50176.42
## 2    6.5267857  2.3333333 0.04444444   2.222222 0.008285841   13517.03
## 3    1.0449473  2.2926209 0.07888041   8.870229 0.005317307  606698.14
## 4    1.3310811  0.0000000 0.00000000   7.666667 0.005819272   25432.73
## 5    1.9605263  0.4927536 0.00000000   5.623188 0.006100637  112119.44
## 6    3.6588235  0.8333333 0.00000000  24.333333 0.007461045   34177.52
##   DP05_0038D DP05_0044D DP05_3958D DP05_0074D DP05_0075D DP05_0076D DP05_0077D
## 1   15243.47   216425.5   36362.03      0.000      0.000     0.0000   8256.879
## 2   58774.96   417941.9   88222.78  12672.219   5430.951   241.3756  12068.780
## 3  991667.48   375942.2  633967.58 169446.690  73909.599  5830.0193 655595.073
## 4    1202.90   153283.8   33853.03      0.000   3093.170     0.0000  23714.305
## 5  166539.98   355536.6  219813.11  16719.565  33930.883     0.0000 190799.746
## 6   28146.19   199838.0  125049.51   3350.737   4020.884     0.0000  97841.521
##   DP05_0037LQ DP05_0038LQ DP05_0044LQ DP05_3958LQ MISS_ZIP hot_qntl hot_eqint
## 1  0.42085394  0.20733476   4.6991270   0.4579336        0        0         0
## 2  0.06236668  0.43976447   4.9918796   0.6111878        0        0         0
## 3  0.62081364  1.64555055   0.9958334   0.9740435        0        1         1
## 4  0.31752898  0.02435433   4.9540857   0.6346154        0        0         0
## 5  0.35039703  0.84402360   2.8763443   1.0314681        0        1         0
## 6  0.23557797  0.31460843   3.5657357   1.2941924        0        0         0
##   hot_jenks hot_pretty
## 1         0          0
## 2         0          0
## 3         1          0
## 4         0          0
## 5         0          0
## 6         0          0

Identify the four race/ethnicity group fields

This section defines the specific population count fields that will be used to construct the Location Quotients. The assignment requires four race and ethnicity groups, so the workflow must make those variables explicit before calculating regional totals or tract-level proportions. This matters because the Location Quotient compares the same group at two scales, tract and region. If the fields are inconsistent or incorrectly selected, the resulting statistic will not match the intended analytical objective. In this New York City tract-level analysis, the four selected groups are White, Black, Asian, and an aggregated Other category.

The code process begins by creating the vector group_fields, which stores the names of the selected Census variables. The first three fields correspond to White, Black, and Asian population counts. The fourth field, DP05_3958E, is intended to represent an aggregated Other category. The line group_fields %in% names(nyc_trct) then checks whether each of those fields actually exists in the dataset. This is important because if one of them is missing, later calculations for totals, proportions, and LQs will fail.

group_fields <- c(
  "DP05_0037E",  # White
  "DP05_0038E",  # Black
  "DP05_0044E",  # Asian
  "DP05_3958E"   # Other / aggregated
)

# Check whether fields exist
group_fields %in% names(nyc_trct)
## [1] TRUE TRUE TRUE TRUE

This section shows how the script could create DP05_3958E if it were not already present by summing AIAN, NHPI, Some Other Race, and Two or More Races. That optional process would use arithmetic addition across those component fields to construct a single aggregated category. Even though that block is currently commented out, it shows that the workflow anticipates a case where a needed analytical field must be derived from existing Census columns rather than simply read from the source dataset.

# If the aggregated "Other" field is not already present, build it
#if (!"DP05_3958E" %in% names(nyc_trct)) {
  #nyc_trct$DP05_3958E <- nyc_trct$DP05_0039E +
    #nyc_trct$DP05_0052E +
    #nyc_trct$DP05_0057E +
    #nyc_trct$DP05_0058E
#}

# Recheck after creating it if needed
#group_fields %in% names(nyc_trct)

#summary(nyc_trct[, group_fields])

Total Population Field.

This section confirms that the dataset contains a valid total population field, which is necessary because all proportion calculations depend on it. In the Location Quotient formula, the tract-level proportion for each group is calculated by dividing that group’s count by total tract population, and the regional proportion is calculated by dividing the group’s citywide total by the citywide population. This means that the total population field is one of the most important denominators in the workflow. If it is missing, zero inappropriately, or otherwise invalid, then both the numerator and denominator pieces of the LQ would be compromised.

The code process begins by assigning the field name DP05_0033E to total_field, which makes the code easier to reuse and interpret. The expression total_field %in% names(nyc_trct) checks whether that variable exists in the dataset. The command summary(nyc_trct[[total_field]]) then prints summary statistics for the total population field, allowing you to inspect the range and central tendency of tract population values. Finally, sum(nyc_trct[[total_field]] == 0, na.rm = TRUE) counts how many tracts have total population equal to zero. This is an important diagnostic because zero-population tracts can create division problems later when tract proportions are calculated.

total_field <- "DP05_0033E"

total_field %in% names(nyc_trct)
## [1] TRUE
summary(nyc_trct[[total_field]])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    2306    3448    3712    4876   15945
# Optional diagnostic
sum(nyc_trct[[total_field]] == 0, na.rm = TRUE)
## [1] 81

Data check for problems before calculating LQ

This section performs a full validation of the variables needed for the LQ analysis before the actual statistic is calculated. In the Basic Statistics assignment, this type of stage was important because calculations should not be trusted unless the fields being used have first been checked for completeness, correct type, and impossible values. That same logic is even more important here because the Location Quotient is built from multiple stages of division. If the underlying fields contain missing values, character data, negative counts, or duplicate tract identifiers, then the LQ output could become misleading or fail entirely. This is therefore a quality-control section that protects the analytical integrity of the workflow.

The code process starts by creating needed_vars, which is the full list of variables required for the analysis, including the tract identifier, tract name, total population, the main race fields, and the aggregated Other field. The function setdiff(needed_vars, names(nyc_trct)) compares the required variables to the dataset’s actual column names and returns anything missing. The if (length(missing) > 0) { stop(...) } block then halts the script if required columns are absent. This is an intentional hard stop that prevents the workflow from continuing with incomplete input data. Next, the script creates na_counts and uses a for loop with sum(is.na(...)) to count missing values in each required variable. This gives a field-by-field diagnostic of missingness. After that, numeric_vars isolates the fields that must be numeric for the calculations to work correctly. A second setdiff() checks whether those expected numeric fields actually exist. The following for loop then converts each numeric variable using as.numeric(), while suppressWarnings() prevents unnecessary clutter from conversion messages. The script records before_na and after_na for each field so that if conversion produces additional missing values, it prints a note. This is a subtle but important diagnostic because it can reveal fields that contained non-numeric characters. Finally, sapply(..., min, na.rm = TRUE) checks the minimum value in each numeric variable after dropping geometry, which gives a quick screen for impossible negative values.

needed_vars <- c(
  "GEOID",
  "NAME",
  "DP05_0033E",
  "DP05_0037E",
  "DP05_0038E",
  "DP05_0044E",
  "DP05_0039E",
  "DP05_0052E",
  "DP05_0057E",
  "DP05_0058E",
  "DP05_3958E"
)

missing <- setdiff(needed_vars, names(nyc_trct))

if (length(missing) > 0) {
  stop("Missing required columns: ", paste(missing, collapse = ", "))
}

na_counts <- c()
for (v in needed_vars) {
  na_counts[v] <- sum(is.na(nyc_trct[[v]]))
}
na_counts
##      GEOID       NAME DP05_0033E DP05_0037E DP05_0038E DP05_0044E DP05_0039E 
##          0          0          0          0          0          0          0 
## DP05_0052E DP05_0057E DP05_0058E DP05_3958E 
##          0          0          0          0
numeric_vars <- c(
  "DP05_0033E",
  "DP05_0037E",
  "DP05_0038E",
  "DP05_0044E",
  "DP05_0039E",
  "DP05_0052E",
  "DP05_0057E",
  "DP05_0058E",
  "DP05_3958E"
)

setdiff(numeric_vars, names(nyc_trct))
## character(0)
for (v in numeric_vars) {
  before_na <- sum(is.na(nyc_trct[[v]]))
  nyc_trct[[v]] <- suppressWarnings(as.numeric(nyc_trct[[v]]))
  after_na <- sum(is.na(nyc_trct[[v]]))
  
  if (after_na > before_na) {
    cat("NOTE: Converting", v, "to numeric increased NAs from", before_na, "to", after_na, "\n")
  }
}

# Quick minimum check for impossible negative values
sapply(
  st_drop_geometry(nyc_trct)[, numeric_vars],
  min,
  na.rm = TRUE
)
## DP05_0033E DP05_0037E DP05_0038E DP05_0044E DP05_0039E DP05_0052E DP05_0057E 
##          0          0          0          0          0          0          0 
## DP05_0058E DP05_3958E 
##          0          0

Thischunk deepens the validity check by explicitly identifying negative values field by field. The which() function is used with logical conditions to locate rows where population values are below zero but not missing. Each group is checked separately, and if any negative values are found, the script uses stop() to halt the workflow with a specific message naming the problem field. This is more informative than a general error because it directly identifies the part of the data that needs correction. At the end of the chunk, duplicated(nyc_trct$GEOID) checks whether any tract identifiers occur more than once. Those duplicated GEOIDs are stored in dups, and if any are found, the script prints a warning and previews the first few duplicates. This does not necessarily stop the workflow, but it alerts you that the one-tract-one-record assumption may have been violated.

bad_pop <- which(!is.na(nyc_trct$DP05_0033E) & nyc_trct$DP05_0033E < 0)
if (length(bad_pop) > 0) stop("Negative total population found.")

bad_white <- which(!is.na(nyc_trct$DP05_0037E) & nyc_trct$DP05_0037E < 0)
if (length(bad_white) > 0) stop("Negative White population found.")

bad_black <- which(!is.na(nyc_trct$DP05_0038E) & nyc_trct$DP05_0038E < 0)
if (length(bad_black) > 0) stop("Negative Black population found.")

bad_asian <- which(!is.na(nyc_trct$DP05_0044E) & nyc_trct$DP05_0044E < 0)
if (length(bad_asian) > 0) stop("Negative Asian population found.")

bad_aian <- which(!is.na(nyc_trct$DP05_0039E) & nyc_trct$DP05_0039E < 0)
if (length(bad_aian) > 0) stop("Negative AIAN population found.")

bad_nhpi <- which(!is.na(nyc_trct$DP05_0052E) & nyc_trct$DP05_0052E < 0)
if (length(bad_nhpi) > 0) stop("Negative NHPI population found.")

bad_other <- which(!is.na(nyc_trct$DP05_0057E) & nyc_trct$DP05_0057E < 0)
if (length(bad_other) > 0) stop("Negative Some Other Race population found.")

bad_two_more <- which(!is.na(nyc_trct$DP05_0058E) & nyc_trct$DP05_0058E < 0)
if (length(bad_two_more) > 0) stop("Negative Two or More Races population found.")

bad_other_agg <- which(!is.na(nyc_trct$DP05_3958E) & nyc_trct$DP05_3958E < 0)
if (length(bad_other_agg) > 0) stop("Negative aggregated Other population found.")

dups <- nyc_trct$GEOID[duplicated(nyc_trct$GEOID)]
if (length(dups) > 0) {
  cat("WARNING: Duplicate GEOID values found (first few):\n")
  print(head(dups, 10))
}

Regional Totals Calculations

This section calculates the citywide totals needed to establish the regional comparison baseline for the Location Quotient. The LQ is a relative concentration measure, so it does not evaluate a tract in isolation. Instead, it compares a tract’s internal composition to the overall composition of the study area, which here is New York City. That means the workflow first needs to know the total population of the city and the citywide total count of each analyzed group. These values create the regional context against which each tract will later be compared.

The code process uses sum(..., na.rm = TRUE) to aggregate total population and each of the four group counts across all tracts. The object region_total_pop stores the total city population, while the other objects store the citywide totals for White, Black, Asian, and Other populations. The na.rm = TRUE argument ensures that missing values are ignored rather than causing the total to become missing itself. After these totals are computed, printing each one allows you to inspect the raw regional counts. This is a useful diagnostic because it lets you verify that the numbers appear plausible before moving on to proportion calculations.

region_total_pop   <- sum(nyc_trct$DP05_0033E, na.rm = TRUE)
region_total_white <- sum(nyc_trct$DP05_0037E, na.rm = TRUE)
region_total_black <- sum(nyc_trct$DP05_0038E, na.rm = TRUE)
region_total_asian <- sum(nyc_trct$DP05_0044E, na.rm = TRUE)
region_total_other <- sum(nyc_trct$DP05_3958E, na.rm = TRUE)

region_total_pop
## [1] 8622467
region_total_white
## [1] 3230647
region_total_black
## [1] 1992200
region_total_asian
## [1] 1247994
region_total_other
## [1] 2151626

Regional Proportions Calculation

This section converts the regional totals into proportions, which form the denominator of the Location Quotient formula. A regional proportion expresses the share of the total city population that belongs to a given group. This matters because the Location Quotient is not based on raw counts. It compares a tract’s proportion for a group to the city’s overall proportion for that same group. In other words, the regional proportions establish what the average distribution looks like across New York City, and each tract is evaluated relative to that baseline.

The code process begins by defining safe_div, a custom function designed to perform division while protecting against invalid results. The function first creates an output vector filled with NA_real_, which ensures the output is numeric. It then checks whether the denominator is a single number or a vector. In either case, it uses logical conditions to identify valid positions where the numerator and denominator are both not missing and where the denominator is greater than zero. Only in those valid positions does it perform the division. This means the function prevents division by zero and avoids propagating missing values inappropriately. After the function is defined, the script calculates region_prop_white, region_prop_black, region_prop_asian, and region_prop_other by dividing each regional group total by the regional total population. Printing these values allows you to confirm that the resulting proportions are sensible and that the denominator side of the LQ is ready for use.

safe_div <- function(num, den) {
  out <- rep(NA_real_, length(num))
  
  if (length(den) == 1) {
    ok <- !is.na(num) & !is.na(den) & den > 0
    out[ok] <- num[ok] / den
  } else {
    ok <- !is.na(num) & !is.na(den) & den > 0
    out[ok] <- num[ok] / den[ok]
  }
  
  out
}

region_prop_white <- safe_div(region_total_white, region_total_pop)
region_prop_black <- safe_div(region_total_black, region_total_pop)
region_prop_asian <- safe_div(region_total_asian, region_total_pop)
region_prop_other <- safe_div(region_total_other, region_total_pop)

region_prop_white
## [1] 0.3746778
region_prop_black
## [1] 0.2310476
region_prop_asian
## [1] 0.1447375
region_prop_other
## [1] 0.2495372

ChatGPT Assitance

Prompt = my .cvs has come back with nulls for all LQ, where is the issue?

The issue in your original workflow came from how the safe_div() function was designed versus how it was being used in the Location Quotient calculation. In your first script, safe_div() was written to handle element-wise division between two vectors of equal length, where each numerator value is paired with a corresponding denominator value, which works correctly for tract-level proportion and density calculations (numerator and denominator both vary by tract) . However, in the LQ step, you introduced a different structure: the numerator was a full vector of tract-level proportions, while the denominator was a single regional proportion value. Because the function still tried to apply vector-style indexing (den[ok]), it only correctly evaluated the first matching case and left the rest as NA, which propagated through the entire LQ output. The revised version works because it explicitly handles this mismatch in scale by recognizing when the denominator is a single value and applying it consistently across all valid numerator entries. This aligns the computation with the actual LQ formula, where a constant regional proportion is used to standardize tract-level values . In short, the original function failed due to a mismatch between vector-based logic and scalar input, and the fix works because it correctly separates those two cases and applies the appropriate division method.

Tract-level Proportions Calculations

This section calculates the tract-level proportions for each group, which form the numerator of the Location Quotient. While the regional proportions describe the overall demographic structure of New York City, the tract-level proportions describe the internal composition of each individual tract. This is what allows the workflow to compare local demographic structure to the broader citywide baseline. A proportion is necessary here because raw counts alone would not be comparable across tracts of very different sizes. Standardizing group counts by tract population makes the local values analytically comparable.

The code process uses the safe_div function again, but this time it divides each tract’s group count by that same tract’s total population. The results are written back into new fields in nyc_trct, named TRACT_PROP_WHITE, TRACT_PROP_BLACK, TRACT_PROP_ASIAN, and TRACT_PROP_OTHER. These are now stored directly in the dataset as derived variables. The subsequent summary() call then prints summary statistics for these four new proportion fields. That output helps verify that the values are within reasonable bounds, usually between zero and one, and that the new fields were successfully created.

nyc_trct$TRACT_PROP_WHITE <- safe_div(nyc_trct$DP05_0037E, nyc_trct$DP05_0033E)
nyc_trct$TRACT_PROP_BLACK <- safe_div(nyc_trct$DP05_0038E, nyc_trct$DP05_0033E)
nyc_trct$TRACT_PROP_ASIAN <- safe_div(nyc_trct$DP05_0044E, nyc_trct$DP05_0033E)
nyc_trct$TRACT_PROP_OTHER <- safe_div(nyc_trct$DP05_3958E, nyc_trct$DP05_0033E)

summary(nyc_trct[, c(
  "TRACT_PROP_WHITE",
  "TRACT_PROP_BLACK",
  "TRACT_PROP_ASIAN",
  "TRACT_PROP_OTHER"
)])
##  TRACT_PROP_WHITE TRACT_PROP_BLACK  TRACT_PROP_ASIAN  TRACT_PROP_OTHER
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.1222   1st Qu.:0.02158   1st Qu.:0.02674   1st Qu.:0.1056  
##  Median :0.3180   Median :0.10013   Median :0.07944   Median :0.1866  
##  Mean   :0.3718   Mean   :0.24031   Mean   :0.15036   Mean   :0.2376  
##  3rd Qu.:0.6096   3rd Qu.:0.41090   3rd Qu.:0.21546   3rd Qu.:0.3378  
##  Max.   :1.0000   Max.   :1.00000   Max.   :0.92321   Max.   :0.9104  
##  NA's   :81       NA's   :81        NA's   :81        NA's   :81      
##             geom     
##  MULTIPOLYGON :2323  
##  epsg:NA      :   0  
##  +proj=aea ...:   0  
##                      
##                      
##                      
## 

Location Quotient for each group Calculation

This section performs the central analytical calculation in the workflow by constructing the Location Quotients themselves. Up to this point, the code has built the two necessary components of the statistic: the tract-level proportions and the regional proportions. The Location Quotient is then produced by dividing the first by the second. Conceptually, this step asks whether a given group is more concentrated or less concentrated in a tract than would be expected based on that group’s overall share of the city population. This is why LQ is useful in geographic analysis. It does not merely show where a group is present, but where it is relatively overrepresented or underrepresented compared to the study area baseline.

The code process again uses safe_div, this time dividing each tract proportion field by the corresponding regional proportion. The results are written into four new fields: LQ_WHITE, LQ_BLACK, LQ_ASIAN, and LQ_OTHER. Because the regional proportions are each a single citywide value, the function applies that one denominator across the full vector of tract-level proportions. The result is a tract-level ratio for each group. A value above 1 indicates relative concentration above the city average, a value near 1 indicates that the tract is close to the citywide pattern, and a value below 1 indicates relative under-representation.

nyc_trct$LQ_WHITE <- safe_div(nyc_trct$TRACT_PROP_WHITE, region_prop_white)
nyc_trct$LQ_BLACK <- safe_div(nyc_trct$TRACT_PROP_BLACK, region_prop_black)
nyc_trct$LQ_ASIAN <- safe_div(nyc_trct$TRACT_PROP_ASIAN, region_prop_asian)
nyc_trct$LQ_OTHER <- safe_div(nyc_trct$TRACT_PROP_OTHER, region_prop_other)

Location Quotient Check

This section cleans the newly created LQ fields by identifying and replacing invalid outputs. Even with a safe division function, it is still necessary to inspect the final LQ variables for values such as Inf, NaN, or NA. These values can appear when denominator conditions are not analytically meaningful, such as a tract with zero total population or a case where a comparison value is not defined. This stage is important because the workflow is not only expected to produce a statistic but also to demonstrate that the output is interpretable and analytically sound .

The code process checks each LQ field with is.infinite() and is.nan(). Wherever those conditions are true, the script replaces the result with NA. This preserves the fact that the value is not analytically interpretable without leaving behind misleading infinite or undefined values. After cleaning, the script uses sum(is.na(...)) to count how many missing values remain in each LQ field and sum(is.infinite(...), na.rm = TRUE) to confirm that no infinite values remain. These checks provide direct evidence that the output has been cleaned and that invalid results have been managed consistently.

nyc_trct$LQ_WHITE[is.infinite(nyc_trct$LQ_WHITE) | is.nan(nyc_trct$LQ_WHITE)] <- NA
nyc_trct$LQ_BLACK[is.infinite(nyc_trct$LQ_BLACK) | is.nan(nyc_trct$LQ_BLACK)] <- NA
nyc_trct$LQ_ASIAN[is.infinite(nyc_trct$LQ_ASIAN) | is.nan(nyc_trct$LQ_ASIAN)] <- NA
nyc_trct$LQ_OTHER[is.infinite(nyc_trct$LQ_OTHER) | is.nan(nyc_trct$LQ_OTHER)] <- NA

sum(is.na(nyc_trct$LQ_WHITE))
## [1] 81
sum(is.na(nyc_trct$LQ_BLACK))
## [1] 81
sum(is.na(nyc_trct$LQ_ASIAN))
## [1] 81
sum(is.na(nyc_trct$LQ_OTHER))
## [1] 81
sum(is.infinite(nyc_trct$LQ_WHITE), na.rm = TRUE)
## [1] 0
sum(is.infinite(nyc_trct$LQ_BLACK), na.rm = TRUE)
## [1] 0
sum(is.infinite(nyc_trct$LQ_ASIAN), na.rm = TRUE)
## [1] 0
sum(is.infinite(nyc_trct$LQ_OTHER), na.rm = TRUE)
## [1] 0

Output Fields

This section confirms that the newly created analytic variables exist and are named clearly. Clear variable naming matters because the final output should be readable and interpreted outside the immediate context of code execution. In the Basic Statistics assignment, naming also mattered because the report needed to show how derived variables connected to the analytic objective. The same logic applies here. The tract-level proportions and the final LQ values should be easy to distinguish so that the reader understands which variables are intermediate measures and which are the primary analytic output.

The code process uses names(nyc_trct) to retrieve all column names and then filters those names to display only the newly created proportion and LQ fields. This does not create the fields, since they were already created in earlier steps. Instead, it verifies that they exist in the dataset and confirms that their names are consistent and interpretative. This is a final naming check before validation and export.

names(nyc_trct)[names(nyc_trct) %in% c(
  "TRACT_PROP_WHITE",
  "TRACT_PROP_BLACK",
  "TRACT_PROP_ASIAN",
  "TRACT_PROP_OTHER",
  "LQ_WHITE",
  "LQ_BLACK",
  "LQ_ASIAN",
  "LQ_OTHER"
)]
## [1] "TRACT_PROP_WHITE" "TRACT_PROP_BLACK" "TRACT_PROP_ASIAN" "TRACT_PROP_OTHER"
## [5] "LQ_WHITE"         "LQ_BLACK"         "LQ_ASIAN"         "LQ_OTHER"

Validate

This section evaluates the LQ outputs to determine whether they behave as expected analytically. The assignment emphasizes that the work is not evaluated only on whether the code runs, but on whether the analysis is correctly constructed and whether the results are ready for analysis. That means the LQ fields need to be reviewed critically before export. In geographic analysis, unusual values may be meaningful, but they may also indicate problems such as denominator errors, very small tract populations, or unintended data issues. This stage therefore functions as a statistical and analytic reasonableness check.

The code process begins with summary() on each LQ field, which reports the minimum, quartiles, median, mean, and maximum. These summaries help you see whether the fields have a sensible central tendency and range. The mean(..., na.rm = TRUE) calls then calculate the average LQ value for each group, which can be used as a broad diagnostic. The sd(..., na.rm = TRUE) calls calculate the standard deviation, which indicates how much variation exists across tracts. Finally, hist() is used to plot the distribution of each LQ field. These histograms help reveal skewness, clustering, long tails, or unusual spikes in the distribution. Together, these functions move the workflow beyond simple code execution and into interpretation, allowing you to judge whether the outputs are reasonable before the final export.

summary(nyc_trct$LQ_WHITE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.3260  0.8487  0.9922  1.6271  2.6690      81
summary(nyc_trct$LQ_BLACK)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00000 0.09339 0.43338 1.04010 1.77844 4.32811      81
summary(nyc_trct$LQ_ASIAN)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.1847  0.5489  1.0389  1.4886  6.3785      81
summary(nyc_trct$LQ_OTHER)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.4230  0.7479  0.9520  1.3536  3.6485      81
mean(nyc_trct$LQ_WHITE, na.rm = TRUE)
## [1] 0.9921987
mean(nyc_trct$LQ_BLACK, na.rm = TRUE)
## [1] 1.0401
mean(nyc_trct$LQ_ASIAN, na.rm = TRUE)
## [1] 1.038878
mean(nyc_trct$LQ_OTHER, na.rm = TRUE)
## [1] 0.9520345
sd(nyc_trct$LQ_WHITE, na.rm = TRUE)
## [1] 0.7371925
sd(nyc_trct$LQ_BLACK, na.rm = TRUE)
## [1] 1.199037
sd(nyc_trct$LQ_ASIAN, na.rm = TRUE)
## [1] 1.201467
sd(nyc_trct$LQ_OTHER, na.rm = TRUE)
## [1] 0.6941084
hist(nyc_trct$LQ_WHITE, breaks = 50, main = "LQ White", xlab = "Location Quotient")

hist(nyc_trct$LQ_BLACK, breaks = 50, main = "LQ Black", xlab = "Location Quotient")

hist(nyc_trct$LQ_ASIAN, breaks = 50, main = "LQ Asian", xlab = "Location Quotient")

hist(nyc_trct$LQ_OTHER, breaks = 50, main = "LQ Other", xlab = "Location Quotient")

Export Fields

This section creates the final non-spatial output table that will be written to CSV. At this stage, the workflow has already completed the main analytic calculations, so the next task is to organize the results into a clean and usable dataset. The assignment requires a table that can be used for later mapping or downstream analysis, which means the export should contain the tract identifier and the analytic variables that were created through the workflow . This section therefore acts as the final data-shaping stage.

The code process uses the base pipe to send nyc_trct into st_drop_geometry(), removing the geometry so that the final object is a plain table. It then uses select() from dplyr to choose the specific fields to keep. Those fields include GEOID, NAME, the total population field, the tract-level proportion fields, and the LQ fields. The selected data are stored in lq_table, which becomes the final export-ready object. The call to head(lq_table) then previews the first rows so you can verify that the right columns were retained and that the table is structured as intended.

lq_table <- nyc_trct |>
  st_drop_geometry() |>
  select(
    GEOID,
    tract_ID = NAME,
    LQ_WHITE,
    LQ_BLACK,
    LQ_ASIAN,
    LQ_OTHER
  )

head(lq_table)
##         GEOID tract_ID   LQ_WHITE   LQ_BLACK  LQ_ASIAN  LQ_OTHER
## 1 36081045000      450 0.42085394 0.20733476 4.6991270 0.4579336
## 2 36081045400      454 0.06236668 0.43976447 4.9918796 0.6111878
## 3 36081045500      455 0.62081364 1.64555055 0.9958334 0.9740435
## 4 36081045600      456 0.31752898 0.02435433 4.9540857 0.6346154
## 5 36081044602   446.02 0.35039703 0.84402360 2.8763443 1.0314681
## 6 36081044800      448 0.23557797 0.31460843 3.5657357 1.2941924
tail(lq_table)
##            GEOID tract_ID   LQ_WHITE   LQ_BLACK  LQ_ASIAN  LQ_OTHER
## 2318 36081040400      404 0.05951975 3.20523106 0.8237786 0.4725020
## 2319 36081041100      411 0.68491748 0.05620926 1.0534075 2.3159759
## 2320 36081041400      414 0.09030315 3.12662071 0.5577556 0.6533663
## 2321 36081041500      415 0.65231567 0.32269250 0.9449252 2.1811116
## 2322 36081042700      427 0.32361726 0.43891800 0.4950275 2.8279868
## 2323 36081043400      434 0.13176434 3.42830246 0.3297248 0.4440475

.CSV Export

This section writes the final table to an external CSV file, completing the workflow from data import to validated output. The CSV format is appropriate because it creates a clean non-spatial table that can be opened outside of R, shared easily, and used in later mapping or analysis steps. This is the final deliverable stage of the assignment, where the tract-level LQ dataset becomes a usable product rather than just an object stored in the R session .

The code process begins with file.path(getwd(), "nyc_tract_location_quotients.csv"), which creates the full output path in the current working directory. The write.csv() function then writes lq_table to that location with row.names = FALSE, which prevents R from adding an extra row-number column to the file. Finally, the cat() function prints a confirmation message and the output path so you can verify that the export completed successfully. This closes the analytical workflow by producing the final tract-level dataset required for the assignment.

student_green_assignment_five_data.csv is attached in the zipfile.

csv_path <- file.path(getwd(), "student_green_assignment_five_data.csv")

write.csv(
  lq_table,
  csv_path,
  row.names = FALSE
)

cat("\nDONE. Wrote CSV to:\n", csv_path, "\n")
## 
## DONE. Wrote CSV to:
##  D:/2026_GES_666_Critical_Maps/Lab_4/student_green_assignment_five_data.csv

Conclusion

This workflow moves from raw tract-level Census data to a structured and interpretative dataset of Location Quotients for New York City. By constructing the LQ through clearly defined components, specifically tract-level proportions as the numerator and regional proportions as the denominator, the workflow ensures that each value reflects a meaningful comparison between local conditions and the broader population structure. This distinction is critical, as the analysis is not simply describing where populations are located, but evaluating how their distribution differs across space relative to the city as a whole.

The results produced through this workflow are analytically useful because they reveal patterns of relative concentration that are not visible through raw counts or basic proportions alone. This is particularly important in the analysis of Black residents in New York City. While composition maps or proportion-based measures show the share of Black residents within each tract, they do not indicate whether that share is high or low relative to the citywide baseline. The Location Quotient adds this missing layer of interpretation by identifying where Black residents are relatively over-represented or underrepresented compared to the overall population structure. This allows for a clearer understanding of spatial concentration, clustering, and uneven distribution, which are central to interpreting patterns shaped by historical processes such as segregation, housing policy, and neighborhood change.

Using the LQ for Black residents also extends the analysis beyond what was possible in the Basic Statistics assignment. Composition and consistency measures describe internal tract characteristics, but they do not account for how those characteristics compare across the entire city. The LQ introduces a relational perspective, allowing each tract to be evaluated within the broader geographic system. This creates opportunities for deeper analysis, such as identifying clusters of high concentration, comparing LQ values across boroughs, or examining how areas of over-representation align with other variables such as income, housing conditions, or investment patterns. It also allows for temporal comparison if applied across multiple years, making it possible to track changes in concentration over time.

The inclusion of validation steps ensures that the results are not only computed correctly but are also reasonable. Summary statistics and distribution checks confirm that the LQ values behave as expected and that no extreme or invalid results are driving the analysis. Overall, the workflow satisfies the requirements of the assignment by demonstrating a clear analytic objective, a structured sequence of data processing steps, and a final output that is both clean and usable. The use of AI-assisted coding supports the construction of the workflow, but the analytic responsibility remains in defining the problem, validating the data, and interpreting the results. In this context, the Location Quotient serves as a critical tool for moving from descriptive mapping to meaningful geographic analysis, allowing the spatial distribution of Black residents in New York City to be understood in relation to the broader urban system.