Geographic Analysis for Business Intelligence

Isai Guizar


The aim of this document is to integrate geographic information systems (GIS) and spatial analysis within the field of Business Intelligence (BI). We will learn how geographic data can enhance decision-making by revealing spatial patterns, regional disparities, and location-based opportunities that are not visible in traditional datasets. Using official data on financial inclusion, we move from understanding the basic structure of spatial data to visualizing results through interactive choropleth maps.



1 Intro

Business Intelligence is often defined as the set of tools and systems used to collect and analyze data to make more informed business decisions. Most BI systems focus on two dimensions: time and categories. For example, dashboards show KPIs, sales over time, customers by segment, or production by line of business. Yet every transaction, customer, or branch also has a location. Recognizing and analyzing that location introduces a third dimension: space.

Geographic analysis expands the scope of BI by answering not only what happened and when, but also where it happened and, consequently, why there. When we map data, patterns of concentration, inequality, or opportunity that are often hidden in tables suddenly become visible. For example:

  • a bank might visualize the distribution of ATMs, branches, and correspondents across Jalisco. Patterns of concentration around Guadalajara would emerge, signaling both market saturation and potential underserved zones. Without mapping these data, such insights would remain buried in rows of a spreadsheet.

  • a company analyzing its branch performance may discover that two offices with equal revenues serve very different markets: one in a dense urban corridor surrounded by competing banks, the other in a rural municipality with few alternatives. Again, such insights rapidly emerge when the data are visualized on a map.

Spatial analysis is not limited to visualization; it expands the set of decision-making tools. Within a BI framework, geographic data become another analytical layer that interacts with time, product, and customer dimensions. Adding this layer allows organizations to:

  • Identify regional markets and customer clusters.

  • Detect potential areas for expansion/reduction.

  • Optimize logistics, distribution, and routing.

  • Evaluate performance adjusted for local conditions (income, population, infrastructure).

  • Monitor risks tied to geography—natural hazards, social vulnerability, or crime incidence.

  • In public administration, the same logic supports territorial equity: visualizing where programs, services, or infrastructure are concentrated helps direct resources to the areas that need them most.

Incorporating spatial data requires attention to data ethics and privacy. Customer addresses, transaction locations, or mobility patterns can reveal sensitive information. Before mapping such data, organizations must aggregate, anonymize, or use publicly available datasets. In academic and public settings, open sources like INEGI or the World Bank provide valuable material without infringing privacy.

Analysts must also ensure data quality and consistency. Matching geographic identifiers, verifying coordinate accuracy, and using consistent projection systems.

Once the value of location was recognized, a natural question arises: how do we technically represent space in a BI environment? This is where Geographic Information Systems (GIS) come in. In the next section we will examine how GIS works, how coordinates and projections define locations, and how spatial data (points, lines, polygons) interact with the tabular information typical of BI. Combining these two worls, BI and GIS, gives rise to what is known as GeoBI, a modern approach that empowers organizations to think not only in classification and trends, but also in territory and context.


2 Geographic Information Systems (GIS)

A GIS is a framework that combines data, methods, and technology to capture, store, analyze, and visualize spatially referenced information. A GIS can be understood as framework designed to work with data that are spatially referenced, meaning they can be positioned on the Earth’s surface using coordinates or geographic identifiers (like electoral districts or postal codes) .

2.1 Core components

  1. Spatial data describe the geometry of geographic entities. They can take three fundamental forms:

    • Points: Represent discrete locations such as bank branches, ATMs, schools, or customer addresses. Each point is defined by a pair of coordinates (latitude and longitude).

    • Lines: Represent linear features such as roads, pipelines, or delivery routes. They model movement, connection, or distance between points.

    • Polygons (areas): Represent bounded regions like municipalities, states, or sales territories.
      They are used to analyze aggregated data (e.g., total sales per region).

    These geometries are stored in spatial data formats such as Shapefile (.shp), GeoJSON, or KML, which can be read by GIS software (QGIS) or spatial analysis libraries in Python and R.

  2. Attribute data. For each spatial feature, attribute data store descriptive information in tabular form. For instance, a polygon representing the municipality of Guadalajara might have attributes for population, GDP, number of financial institutions, and average income. These attributes are the metrics or indicators that we analyze, while the spatial features provide the dimension of geography.

  3. Software. A GIS environment includes tools for data entry (digitization, GPS capture, importing shapefiles), data management (databases such as PostGIS), spatial analysis (e.g., calculating distances, densities, or clusters), and visualization (map design and dashboards).

Although some Business Intelligence platforms such as Power BI or Tableau already embed GIS capabilities, allowing maps to be created directly from datasets that include coordinates or region names, for analytical depth, open-source libraries like GeoPandas and Folium in Python provide the flexibility to build customized spatial analyses, as we do in this document.

2.2 Representing spatial data on a computer

Geographic objects can be represented on a computer either with vector or raster data models. Both describe geographic phenomena, but they differ in how they conceptualize the Earth’s surface. Vector data represent geographic features as discrete objects, are constructed from points, which are connected to each other forming lines and polygons, with precisely defined boundaries. Raster data represent geographic features as a continuous surface divided into a grid of small cells (pixels), where each cell stores a value. Both models are complementary rather than competing: vectors are ideal for discrete features (such as locations of stores), while rasters are better for continuous variables (such as population density, light intensity, elevation or temperature).

The vector model treats geographic entities as individual features defined by coordinates.

Geometry Description Examples
Point Single coordinate (x, y) representing a precise location. Branch locations, ATMs, client addresses.
Line Ordered sequence of points forming a path. Delivery routes, road networks, supply chains.
Polygon Closed shape bounded by lines; represents an area. Sales territories, municipalities, market zones.

Each feature can store multiple attributes in an accompanying table. For example, each polygon (municipality) can contain attributes such as total population, income, or financial inclusion indicators. Wector data are the foundation for choropleth maps, point distribution maps, and network analyses (e.g., route optimization). They allow precise measurement, querying (“select all branches within 5 km of Guadalajara”), and relational operations (joining branches to their respective regions).

Vector data is most commonly stored in a “shapefile”. A shapefile is composed of 3 required files with the same prefix but different extensions:

spatial-data.shp: main file that stores records of each shape geometries

spatial-data.shx: index of how the geometries in the main file relate to one-another

spatial-data.dbf: attributes of each record

The raster model divides space into a regular grid of cells, each containing a numeric value that represents a property of that location. Each cell (pixel) has a uniform size, commonly between 10 and 1,000 meters on the ground, and collectively the grid represents a continuous surface. Raster data are commonly used for environmental, demographic, or imagery-based information, where phenomena vary gradually across space rather than in discrete units.

Example Description Use
Satellite imagery Each cell stores reflectance or color value. Visual context or land-use classification.
Population density surface Each cell stores estimated inhabitants per km². Market potential or customer base estimation.
Heat maps Cells represent intensity of events (transactions, visits). Identify commercial hotspots or risk zones.

Both models often coexist. The vector layer provides structure (regions, branch locations, administrative units). The raster layer provides background or continuous context (population density, socioeconomic heat, satellite basemap). Together, they enable multi-layered insights.

2.3 Georeferencing

The actual process of assigning real-world coordinates or geographic identifiers to data so they can be accurately located on a map is called georeferencing. For example: a table of bank branches can be georeferenced by attaching each address to a latitude–longitude pair. A dataset of municipalities can be linked to the official geographic boundaries published by INEGI. Without georeferencing, data remain abstract, they are simply numbers without location. With it, every observation acquires spatial meaning, allowing it to be compared, aggregated, or visualized geographically.

To locate objects on Earth, GIS uses coordinate reference systems (CRS). The most common is WGS84 (EPSG:4326), which expresses positions using degrees of latitude (north–south) and longitude (east–west). Other projections, such as UTM (Universal Transverse Mercator), express locations in meters, which are useful for local or regional analysis. When integrating multiple data sources, the analyst must ensure that all layers share the same CRS. Otherwise, maps will misalign or display distorted geometries.


3 Exploratory Spatial Analysis

Exploratory Spatial Analysis (ESA) is the process of examining spatial data to discover patterns, relationships, and anomalies before applying formal models or predictions. Just as Exploratory Data Analysis (EDA) helps analysts understand numerical datasets through visualization and summary statistics, ESA reveals the geographic structure of information (distribution, cluster, or dispersion of phenomena across territory).

ESA allows organizations to answer questions such as:

  • Where are sales, clients, or resources concentrated?

  • Which territories show performance above or below average?

  • Are neighboring regions similar or different in behavior?

  • What spatial factors might explain market opportunities or operational challenges?

ESA thus forms the bridge between data visualization and spatial reasoning, preparing analysts for more advanced spatial modeling.

Because vector data treat geographic entities as discrete objects, ESA with vectors focuses on relationships between those objects. The analysis might involve:

  • Points: distribution of facilities, clients, or transactions.

  • Polygons: comparison of indicators among administrative units (states, municipalities, territories).

  • Lines: network accessibility and flow analysis.

Typical exploratory operations include counting points within polygons (e.g., number of ATMs per municipality), calculating nearest-neighbor distances to assess clustering, measuring spatial autocorrelation between polygon attributes (e.g., similarity of inclusion levels among adjacent municipalities), and mapping ratios or densities as choropleth visualizations.

3.1 Types of maps

ESA begins with visual exploration. Four main types of vector-based maps are commonly used:

Map Type Description BI Example
Point Map Displays discrete events or objects as dots on the map. Useful for visualizing distribution and density. Locations of bank branches, customer visits, or retail outlets.
Choropleth Map Shades polygons based on a quantitative variable. Reveals concentration and regional differences. Municipal inclusion index, sales per territory, average income.
Proportional Symbol Map Uses symbol size to represent magnitude at each location. Transaction volume by branch.
Flow Map (Network Map) Uses lines whose thickness indicates volume or frequency of movement. Supply-chain routes or inter-branch money transfers.

3.2 Spatial Descriptive Measures

Simple descriptive measures can formalize observations:

  1. Mean Center and Standard Distance – Identify the geographic “center of gravity” of a dataset and how dispersed the features are around it. – Example: locating the average position of all ATMs in Jalisco.

  2. Spatial Density (Kernel or Count) – Measure how many features fall within a certain area. – Example: number of branches per 10 000 adults or per km².

  3. Spatial Autocorrelation (conceptual overview) – Indicates whether similar values occur near each other. – A high-inclusion municipality surrounded by others with high inclusion suggests positive autocorrelation (spatial clustering). In practice, indices like Moran’s I or Getis-Ord Gi* quantify this; conceptually, they capture spatial dependence.


4 Integrating Tabular and Geographic Data

Up to this point, we have learned why the spatial dimension is essential to Business Intelligence. We are now ready to build our own geographic dataset (merging traditional indicators with spatial information) so we can analyze and visualize patterns across territory. We focus on the practical workflow that transforms separate data tables into a unified GeoDataFrame. We will work in Google Colab using open-source Python libraries such as pandas, geopandas, and folium.

NoteApplication 1: Financial Infrastructure in Jalisco

This example uses indicators of financial access in the state of Jalisco as a case of study, but the same method applies to any business dataset with regional identifiers. Data come from the CNBV: Financial infrastructure.

Objectives:

Load and explore a geographic layer (municipal polygons).

  • Load and inspect tabular data.

  • Merge the two datasets through a common geographic key.

  • Compute new spatial indicators.

  • Create an interactive map that visualizes business or policy variables geographically.

4.1 Load data

import pandas as pd
import numpy  as np
import matplotlib.pyplot as plt
#!pip install geopandas
import geopandas as gpd

Load the Geographic Layer

Jalisco = gpd.read_file("14_Jalisco")  #  points to the shapefile "directory"
Jalisco.crs 
Jalisco = Jalisco.to_crs("EPSG:4326")   
# type(Jalisco)

EPSG:6362 is INEGI’s projected coordinate system is EPSG:6362, we reproject to the global WGS 84 system (EPSG:4326), which is compatible with most web-based visualization tools.

Jalisco.columns
print(f"Number of municipalities in Jalisco: {len(Jalisco)}")
Number of municipalities in Jalisco: 125

Plotting the layer:

fig, ax = plt.subplots(figsize=(7,7))

# Plot municipalities
Jalisco.plot(ax=ax, color="lightgrey", edgecolor="slategray")
ax.set_title("Municipalities in the State of Jalisco", fontsize=12)
ax.axis("off")
# plt.savefig("Jalisco_municipalities.png", bbox_inches="tight", dpi=150)
plt.show()

Zapopan = Jalisco[Jalisco["NOMGEO"] == "Zapopan"].copy()

ax = Jalisco.plot(figsize=(7,7), color="lightgrey", edgecolor="slategray")
Zapopan.plot(ax=ax, color="steelblue", edgecolor="white")
ax.set_title("Location of Zapopan, Jalisco", fontsize=14)
ax.axis("off")
# plt.savefig("highlight_Zapopan.png", bbox_inches="tight", dpi=150)
plt.show()

Zapopan.plot(color="steelblue", edgecolor="slategray", figsize=(7, 7))
plt.title('Zapopan')
ax.axis("off")
plt.show()

Load tabular data

The Excel file “Info_Inclusion_Financiera.xlsx” contains financial inclusion indicators by municipality. Rows are municipalities and columns are variables (population, ATMs, branches, corresponsales, etc.).

df_inclusion = pd.read_excel("Info_Inclusion_Financiera.xlsx")
df_inclusion.head()
Clave Municipio Clave \nEstado Región Estado Municipio Población total Población adulta Población adulta Mujeres Población adulta Hombres Tipo de población Índice de rezago social Corresponsales\nNúmero Sucursales\nNúmero Cajeros automáticos\nNúmero TPV\nNúmero Establecimientos con TPV\nNúmero Cuentas que efectúan transacciones por celular\nNúmero Transacciones en cajeros automáticos\nNúmero Transacciones en TPV\nNúmero
0 14001 14 Occidente y Bajío Jalisco Acatic 25136 18364 9405 8959 Semi-urbano Muy bajo 7 5 6 64 58 9541 25330 11739
1 14002 14 Occidente y Bajío Jalisco Acatlán de Juárez 28926 20700 10058 10642 Semi-urbano Muy bajo 17 8 14 109 100 16867 46300 15901
2 14003 14 Occidente y Bajío Jalisco Ahualulco de Mercado 24676 18633 9418 9215 Semi-urbano Muy bajo 16 6 14 142 117 14832 53854 13451
3 14004 14 Occidente y Bajío Jalisco Amacueca 6165 4525 2379 2146 En transición Muy bajo 2 3 2 7 7 1718 2793 181
4 14005 14 Occidente y Bajío Jalisco Amatitán 17851 12854 6468 6386 Semi-urbano Muy bajo 5 4 10 93 87 9967 24151 20291
df_inclusion.dtypes
Clave Municipio                                            int64
Clave \nEstado                                             int64
Región                                                    object
Estado                                                    object
Municipio                                                 object
Población total                                            int64
Población adulta                                           int64
Población adulta Mujeres                                   int64
Población adulta Hombres                                   int64
Tipo de población                                         object
Índice de rezago social                                   object
Corresponsales\nNúmero                                     int64
Sucursales\nNúmero                                         int64
Cajeros automáticos\nNúmero                                int64
TPV\nNúmero                                                int64
Establecimientos con TPV\nNúmero                           int64
Cuentas que efectúan transacciones por celular\nNúmero     int64
Transacciones en cajeros automáticos\nNúmero               int64
Transacciones en TPV\nNúmero                               int64
dtype: object

The ‘’ come from line breaks in Excel headers. Clean as follows:

df_inclusion.columns = (
    df_inclusion.columns
    .str.replace("\n", " ", regex=False)  # remove line breaks
    .str.strip()
)
df_inclusion.dtypes
Clave Municipio                                           int64
Clave  Estado                                             int64
Región                                                   object
Estado                                                   object
Municipio                                                object
Población total                                           int64
Población adulta                                          int64
Población adulta Mujeres                                  int64
Población adulta Hombres                                  int64
Tipo de población                                        object
Índice de rezago social                                  object
Corresponsales Número                                     int64
Sucursales Número                                         int64
Cajeros automáticos Número                                int64
TPV Número                                                int64
Establecimientos con TPV Número                           int64
Cuentas que efectúan transacciones por celular Número     int64
Transacciones en cajeros automáticos Número               int64
Transacciones en TPV Número                               int64
dtype: object

4.2 Merging with Geographic Data

At this stage, we have two complementary datasets:

Tabular data (df_inclusion): a table of financial inclusion indicators from the Excel file. Where each row is one municipality, while columns contain indicators or characteristics (population, ATMs, branches, etc.)

Geographic data (gdf_jalisco): polygons representing the boundaries of Jalisco’s municipalities. Wher each row is one municipality, and the columns include a geographic identifier and a geometry column describing the polygon.

Our objective is to merge these two so that each polygon gains all the numerical indicators from the table. Once merged, the resulting GeoDataFrame becomes the spatial foundation for further analysis and visualization. Possible only if both datasets contain a common field that uniquely identifies each municipality.

print(df_inclusion.columns)
print(Jalisco.columns)
Index(['Clave Municipio', 'Clave  Estado', 'Región', 'Estado', 'Municipio',
       'Población total', 'Población adulta', 'Población adulta Mujeres',
       'Población adulta Hombres', 'Tipo de población',
       'Índice de rezago social', 'Corresponsales Número', 'Sucursales Número',
       'Cajeros automáticos Número', 'TPV Número',
       'Establecimientos con TPV Número',
       'Cuentas que efectúan transacciones por celular Número',
       'Transacciones en cajeros automáticos Número',
       'Transacciones en TPV Número'],
      dtype='object')
Index(['CVEGEO', 'CVE_ENT', 'CVE_MUN', 'NOMGEO', 'geometry'], dtype='object')

Compare a few entries:

df_inclusion["Clave Municipio"].sort_values().unique()[:5]
Jalisco["CVEGEO"].sort_values().unique()[:5]
array(['14001', '14002', '14003', '14004', '14005'], dtype=object)

Note: If you are joining by names and you see slight spelling differences, accents, or capitalization discrepancies, you’ll need to clean them before merging.

Also, for a merge to work the join keys must be of the same type:

Jalisco[["CVEGEO"]].dtypes
df_inclusion[["Clave Municipio"]].dtypes
Clave Municipio    int64
dtype: object

Here, one side of the join is an object, while the other is integer. Fix this making both strings:

Jalisco["CVEGEO"] = Jalisco["CVEGEO"].astype(str)
df_inclusion["Clave Municipio"] = df_inclusion["Clave Municipio"].astype(str)

Municipal codes usually are of 5 digits (e.g. 14 + 001 : 14001), the first two indicate the State (Jalisco is 14) and the remaining three are to identify the municipality. To make sure the code always has 5 digits you can zero-pad them:

Jalisco["CVEGEO"] = Jalisco["CVEGEO"].astype(str).str.zfill(5)
df_inclusion["Clave Municipio"] = df_inclusion["Clave Municipio"].astype(str).str.zfill(5)

Perform the merge:

geoData = pd.merge(
    Jalisco,
    df_inclusion,
    left_on='CVEGEO',
    right_on='Clave Municipio',
    how='left',
)

geoData must contain now both geometry columns and financial indicators columns.

geoData.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 125 entries, 0 to 124
Data columns (total 24 columns):
 #   Column                                                 Non-Null Count  Dtype   
---  ------                                                 --------------  -----   
 0   CVEGEO                                                 125 non-null    object  
 1   CVE_ENT                                                125 non-null    object  
 2   CVE_MUN                                                125 non-null    object  
 3   NOMGEO                                                 125 non-null    object  
 4   geometry                                               125 non-null    geometry
 5   Clave Municipio                                        125 non-null    object  
 6   Clave  Estado                                          125 non-null    int64   
 7   Región                                                 125 non-null    object  
 8   Estado                                                 125 non-null    object  
 9   Municipio                                              125 non-null    object  
 10  Población total                                        125 non-null    int64   
 11  Población adulta                                       125 non-null    int64   
 12  Población adulta Mujeres                               125 non-null    int64   
 13  Población adulta Hombres                               125 non-null    int64   
 14  Tipo de población                                      125 non-null    object  
 15  Índice de rezago social                                125 non-null    object  
 16  Corresponsales Número                                  125 non-null    int64   
 17  Sucursales Número                                      125 non-null    int64   
 18  Cajeros automáticos Número                             125 non-null    int64   
 19  TPV Número                                             125 non-null    int64   
 20  Establecimientos con TPV Número                        125 non-null    int64   
 21  Cuentas que efectúan transacciones por celular Número  125 non-null    int64   
 22  Transacciones en cajeros automáticos Número            125 non-null    int64   
 23  Transacciones en TPV Número                            125 non-null    int64   
dtypes: geometry(1), int64(13), object(10)
memory usage: 23.6+ KB

Verify whether any polygons failed to find a matching record:

missing = geoData[geoData['Sucursales Número'].isna()]
print(len(missing))
0
geoData[["Población adulta", "Cajeros automáticos Número"]].describe()
Población adulta Cajeros automáticos Número
count 1.250000e+02 125.000000
mean 5.411965e+04 38.472000
std 1.700648e+05 156.607872
min 1.278000e+03 0.000000
25% 5.748000e+03 3.000000
50% 1.357100e+04 6.000000
75% 2.801900e+04 15.000000
max 1.265031e+06 1338.000000

Confirm that every municipality now contains both its geometry and attributes:

geoData[["NOMGEO", "Cajeros automáticos Número", "Sucursales Número", "geometry"]].head()
NOMGEO Cajeros automáticos Número Sucursales Número geometry
0 Zapotitlán de Vadillo 3 3 POLYGON ((-103.66059 19.5976, -103.66081 19.59...
1 Bolaños 4 3 POLYGON ((-103.8884 22.02637, -103.88838 22.02...
2 Pihuamo 4 4 POLYGON ((-103.27457 19.37393, -103.27432 19.3...
3 Tonalá 137 51 POLYGON ((-103.2657 20.70554, -103.26526 20.70...
4 San Martín de Bolaños 1 1 POLYGON ((-103.89007 21.78033, -103.88952 21.7...

Visual Verification:

fig, ax = plt.subplots(figsize=(7,7))

geoData.plot(
    column="Cajeros automáticos Número",
    cmap="Blues",
    legend=True,
    edgecolor="black",
    linewidth=0.5,
    ax=ax
)

# Add title and subtitle
ax.set_title(
    "Distribution of ATMs in Jalisco, by municipality",
    fontdict={"fontsize": 12},
    pad=15
)
ax.text(
    0.5, -0.1,
    "Source: CNBV, Indicadores de Inclusión Financiera",
    transform=ax.transAxes,
    fontsize=9,
    ha="center",
    color="dimgray"
)

ax.set_axis_off()
plt.show()

4.3 Calculating indicators of interest

# 1. ATMs per 10,000 
geoData["atm_per_10k"] = (
    geoData["Cajeros automáticos Número"] / geoData["Población total"] * 10000
)

# 2. Branches per 10,000
geoData["branches_per_10k"] = (
    geoData["Sucursales Número"] / geoData["Población total"] * 10000
)

Why did we normalize? To make municipalities comparable despite differences in population.

Generate descriptive stats:

geoData[[
    "atm_per_10k",
    "branches_per_10k",
]].describe()
atm_per_10k branches_per_10k
count 125.000000 125.000000
mean 4.115009 3.495350
std 2.137648 2.879419
min 0.000000 0.000000
25% 2.803162 1.962323
50% 3.904217 2.964720
75% 4.950648 4.195217
max 15.959145 20.629190

4.4 Geographic Visualization

A choropleth map colors each municipality according to the value of an indicator—such as “ATMs per 10,000 adults.” For Business Intelligence, this is equivalent to a KPI dashboard adding a dimension of space. We would like to see, for instance:

  • Clusters of high and low performance

  • Gaps in service coverage

  • Regional inequalities or growth corridors

# !pip install folium
import folium
from folium.features import GeoJsonTooltip
# Create a base map centered roughly on Jalisco
m = folium.Map(location=[20.67, -103.35], 
                zoom_start=7, 
                tiles="cartodb positron",    
                width=700,   # or "70%" 
                height=700   # or "70%"
                )
                
folium.Choropleth(
    geo_data=geoData,
    data=geoData,
    columns=["CVEGEO", "atm_per_10k"],  # municipality code + indicator
    key_on="feature.properties.CVEGEO",
    fill_color="YlGnBu",                
    fill_opacity=0.8,
    line_opacity=0.5,
    legend_name="ATMs per 10 000 people",
).add_to(m)
<folium.features.Choropleth at 0x14ecb8e60>
m

Add interactive tooltips:

tooltip = folium.features.GeoJsonTooltip(
    fields=["NOMGEO", "atm_per_10k"],
    aliases=["Municipality:", "ATMS per 10 000 people:"],
    localize=True
)

folium.GeoJson(
    geoData,
    tooltip=tooltip,
    name="ATM rate",
    style_function=lambda feature: {
        "fillColor": "transparent",  # keep choropleth colors underneath
        "color": "slategray",        # edge color (neutral)
        "weight": 0.5,               # thinner border
        "fillOpacity": 0.0,          # no extra fill from this layer
    }
).add_to(m)

m

What do these spatial patterns mean for decision-makers?

4.5 Spatial Descriptive Measures

Moran’s I (moran1950?) is a spatial correlation coefficient. It’s analogous to the Pearson correlation, but instead of comparing two variables, it compares a variable with its spatial lag. That is, the average value of that variable in neighboring areas. If high values are surrounded by high values then there is a positive spatial autocorrelation (clustering). But if high values are surrounded by low values the autocorrelation is negative (dispersion). If there is no clean pattern, there is a random spatial arrangement. Formally,

\[ I = \frac{n}{W}\frac{ \sum_i \sum_j w_{ij}(Y_i - \bar Y)(Y_j - \bar Y)} { \sum_i (Y_i - \bar Y)^2}, \]

where \(n\) is the number of spatial units (e.g., municipalities), \(Y_i\) is the observed value of the variable of interest in unit \(i\), \(\bar{Y}\) is the mean of all values, \(\omega_{ij}\) are weights that represent spatial proximity between units \(i\) and \(j\) (e.g., 1 if they are neighbors, 0 otherwise, or a distance-based weight). \(W = \sum_i \sum_j w_{ij}\) is the sum of all weights.

While the global Moran’s I gives one single value summarizing the entire map, Local Moran’s I decomposes that global statistic into a separate measure for each spatial unit.

\[ I_i = \frac{Y_i - \bar{Y}}{S^2} \sum_j w_{ij}(Y_i=Y_j) \] where: \[ S^2 = \frac{1}{n-1}\sum_i(y_i-\bar{y})^2 \] \(I_i\) is the local contribution of unit \(i\) to the overall spatial autocorrelation. So \(I_i\) is positive if \(i\) and its neighbors deviate from the mean in the same direction (both high or both low), and negative if \(i\) deviates in the opposite direction (high surrounded by low, or vice versa).

Indicator Description Range Meaning
Global Moran’s I Overall spatial autocorrelation (global pattern) -1 to +1 + → clustering, 0 → random, - → dispersion
Local Moran’s I (LISA) Local clusters of similar/dissimilar values varies Identifies “hot spots” and “cold spots”
# !pip install libpysal esda splot

import libpysal
from esda import Moran, Moran_Local
from libpysal.weights import KNN

To estimate the measures, we need to define neighbors. A simple and intuitive choice is Queen contiguity, where polygons sharing any border or vertex are neighbors. But some units may be disconnected in the contiguity graph. We need to define ‘neighbors’, e.g. based on distance or borders shared. K-nearest neighbors weights (KNN) are the k geometrically closest units (based on centroids), which guarantees everyone has k neighbors (no islands) thus is more flexible for irregular or fragmented geometries.

# Weights matrix based on shared boundaries
w = libpysal.weights.Queen.from_dataframe(geoData)
w.transform = 'r' 

Global Moran’s I

geoData = geoData.set_index("CVEGEO")
w_knn = KNN.from_dataframe(geoData, k=5, use_index=True)
w_knn.transform = "r"   # row-standardize weights
print("Number of observations:", w_knn.n)
print("K neighbors per unit:", w_knn.k)
print("Islands:", w_knn.islands)
Number of observations: 125
K neighbors per unit: 5
Islands: []
y = geoData["atm_per_10k"]

moran_knn = Moran(y, w_knn)
print("Moran's I (KNN):", round(moran_knn.I, 3))
print("p-value:", round(moran_knn.p_sim, 4))
Moran's I (KNN): 0.109
p-value: 0.017

If \(I > 0\), municipalities with similar values (high or low) tend to be near each other. Clustering. If \(I ≈ 0\), spatial randomness. If \(I < 0\), high values are surrounded by low ones. Dispersion.

Local Moran’s I

Code Type Meaning
1 High–High Hot spot (high values surrounded by high)
2 Low–Low Cold spot (low surrounded by low)
3 High–Low Spatial outlier (high surrounded by low)
4 Low–High Spatial outlier (low surrounded by high)
lisa_knn = Moran_Local(y, w_knn)

geoData["LISA_I_knn"] = lisa_knn.Is
geoData["LISA_p_knn"] = lisa_knn.p_sim
geoData["LISA_q_knn"] = lisa_knn.q  # quadrant: 1=HH, 2=LL, 3=HL, 4=LH
(geoData["LISA_p_knn"] < 0.05).sum()
19
def lisa_label(row):
    if row["LISA_p_knn"] >= 0.05:
        return "Not singnificant"
    if row["LISA_q_knn"] == 1:
        return "Hi-Hi"
    if row["LISA_q_knn"] == 2:
        return "Lo-Lo"
    if row["LISA_q_knn"] == 3:
        return "Hi-Lo"
    if row["LISA_q_knn"] == 4:
        return "Lo-Hi"
geoData["LISA_label_knn"] = geoData.apply(lisa_label, axis=1)


fig, ax1 = plt.subplots(figsize=(7,7))

# Plot the LISA clusters
geoData.plot(
    column="LISA_label_knn",
    cmap="Set1",
    legend=True,
    categorical=True,
    edgecolor="slategray",
    linewidth=0.25,   # use one linewidth value
    ax=ax1
)

# Add title
ax1.set_title(
    "Local Moran's I – Clusters of Spatial Autocorrelation",
    fontdict={"fontsize": 12},
    pad=15
)


ax1.set_axis_off()
plt.show()

5 Resources

https://pythongis.org/index.html

https://www.kdnuggets.com/2023/08/5-python-packages-geospatial-data-analysis.html

https://www.tomasbeuzen.com/python-for-geospatial-analysis/README.html

https://spatial-analytics.readthedocs.io/en/develop/index.html

https://volaya.github.io/libro-sig/